Two-step genetic programming for optimization of RNA common-structure

合集下载

多目标遗传算法

多目标遗传算法
A more appropriate approach to deal with multiple objectives is to use techniques that were originally designed for that purpose in the eld of Operations Research. Work in that area started a century ago, and many approaches have been re ned and commonly applied in economics and control theory.
Abstract: In this paper we propose the use of the genetic algorithm (GA) as a tool to solve multiobjective
optimization problems in structures. Using the concept of min-max optimum, a new GA-based multiobjective optimization technique is proposed and two truss design problems are solved using it. The results produced by this new approach are compared to those produced by other mathematical programminቤተ መጻሕፍቲ ባይዱ techniques and GA-based approaches, proving that this technique generates better trade-o s and that the genetic algorithm can be used as a reliable numerical optimization tool.

基于正交试验的遗传算法参数优化

基于正交试验的遗传算法参数优化
尽管 GA已成 功应 用 于许 多优 化 问题 中 ,但 其 本身 也或 多 或少存 在 一些 缺 陷 .比如 ,算 法基 本模 型 中 参 数 正 确选 择 与否 将 对优 化 结果 产 生很 大 的 影 响 .如何 优 化算 法 自身 的参数 使其 性 能较 好 ,从根 本 上来 说 就 是 一个 较 为 复杂 的研究 难题 .由于 庞大 的参 数 空 间及 各 参数 之 间 的关联 性 ,还 没有 最优 参 数确 定 的 基 本 方 法 .经 验 法是 目前较 常 用 的优 化 方法 ,但 是 经验 法 不能 够保 证 针 对每 一类 优 化 问题参 数 的选 择都 是 最 合适 的 .算 法参 数选 择 的理论 性 研究 和实 验分 析论 证非 常必 要 .
遗传 算 法 (genetic algorithm,GA)是 在 自然环 境 中模 拟生 物 的“优 胜劣 汰 ”而形 成 的进化 的智 能化 的优 化 方 法 ,往 往 能在 搜 索 空 间高 度 复杂 的问题 上 取得 较 好 的效 果 .其 基 本原 理 最早 由 Holland提 出 ….近几 十年 来 ,随着 对 理论 和 应用 领域 的不 断深入 研 究 和拓 广 ,遗传 算 法发展 迅 速 ,并 以其 较 优 的性 能引起 了人 们更多的关注 ,已成功地应用于路径规划[2 ]、作业车间调度n_7]、旅行商 等领域 .
作 为 一 种 试验 的优 化设 计 技术 ,正交 试 验 法 的优 点 是 能够 以相 当少 的试 验 次 数 、较 短 的试 验 时 间 和 较 低 的试 验 费用 得 到 满 意 的试 验 结果 .本 文 试 图将 正 交试 验 法 引 入 GA中 ,对 GA 中关键 参 数— —种 群 规 模 Ⅳ、交 叉概 率 P 和变 异 概率 P 进 行 分析 和 研究 .仿 真结 果表 明了采 用该 方 法优 化参 数 的科 学性 和有 效 性 ,也 为该算 法 在其 他研 究方 面选 择参 数 时提供 了一定 的参 考依 据 .

GP(Genetic Programming)算法

GP(Genetic Programming)算法
1 2
然而在实际中,我们要限制其数目和其形态大小。比如说, 然而在实际中,我们要限制其数目和其形态大小。比如说, 我们可限制一棵树的形态大小为W(用最大结点数表示)。 我们可限制一棵树的形态大小为 (用最大结点数表示)。 一旦W给定 那么由所有不超过W个结点 给定, 个结点, 一旦 给定,那么由所有不超过 个结点,且包含特定子树 的树的集合是有限的, 的树的集合是有限的,即
一旦w给定那么由所有不超过w个结点且包含特定子树的树的集合是有限的即gp算法的模式理论在gp算法中模式的平均适应度简单地定义为该模式中所有个体适应度的平均值即gp算法的模式理论在gp算法中模式因进化而出现的数目增长或衰减取决于模式的平均适应度与群体平均适应度的比值
GP(Genetic Programming)算法 ( )
f (H , t ) m(H, t +1) ≥ m(H, t)(1−δ ) f (t)
群体平均适应度; 式中 f (t ) ――群体平均适应度; 群体平均适应度 子代模式的平均适应度; 子代模式的平均适应度 f (H, t) ――子代模式的平均适应度; 子代属于模式的个体数; 子代属于模式的个体数 m(H,t) ――子代属于模式的个体数; 模式破坏的概率。 模式破坏的概率 δ ――模式破坏的概率。
一 概述
GP算法求解问题的主要特征如下: 算法求解问题的主要特征如下: 算法求解问题的主要特征如下 1、产生的结果具有层次化的特点。 、产生的结果具有层次化的特点。 2、随着进化的延续,个体不断朝问题答案的方向动态地发展。 、随着进化的延续,个体不断朝问题答案的方向动态地发展。 3、不需事先确定或限制最终答案的结构或大小,GP算法将根 、不需事先确定或限制最终答案的结构或大小, 算法将根 据环境自动确定。这样,系统观测物理世界的窗口得以扩大, 据环境自动确定。这样,系统观测物理世界的窗口得以扩大, 最终导致找到问题的真实答案。 最终导致找到问题的真实答案。 4、输入、中间结果和输出是问题的自然描述,无需或少需对输 、输入、中间结果和输出是问题的自然描述, 入数据的预处理和对输出结果的后处理。 入数据的预处理和对输出结果的后处理。由此而产生的计算 机程序便是由问题自然描述的函数组成的。 机程序便是由问题自然描述的函数组成的。

谷氨酸_色氨酸_丝氨酸发酵进展_刘贤雪

谷氨酸_色氨酸_丝氨酸发酵进展_刘贤雪
我国氨基酸生产最早在 1922 年用酸法水解 面筋生产谷氨酸钠即味精,1965 年发酵法生产味 精的成功,带动了其它氨基酸的研究开发。
自从氨基酸法生产谷氨酸成功以后,世界各 国纷纷开展氨基酸发酵的研究与生产, 产量增 长很快。 但除谷氨酸、赖氨酸和蛋氨酸外,其它 品种产量均不大。 目前世界各大氨基酸生产国 的厂商积极发展氨基酸的新技术, 国内外氨基 酸产业的发展促使生产技术和手段方面的突飞 猛进外, 氨基酸深层次加工及新产品开发是今 后发展方向。 本文综述了发酵生产谷氨酸、色氨 酸、丝氨酸的最新研究进展,以期对其工业化生 产有指导意义。
4 丝氨酸发酵进展
L-丝氨酸属非必需氨基酸,但其作为一种生 化试剂,在食品、饲料、医药、农业以及化妆品工业 中广泛应用。 L-丝氨酸处于氨基酸代谢的中间位 置,参与许多生物物质(如甘氨酸、蛋氨酸、嘌吟 等)的合成,代谢运转速度极快。 因此,与其它类氨 基 酸 相 比 ,L- 丝 氨 酸 的 直 接 发 酵 法 生 产 十 分 困 难。 L-丝氨酸发酵法生产的研究报告主要集中于 利用添加前体物发酵生产 L-丝氨酸, 甘油酸盐 (或酯)、甘油酸三甲内盐、甘氨酸被研究认为是合 成 L-丝氨酸最有前景的前体物[19]。
—38—
7.33kg / m3,生物素 1.783mg / m3,温度 33.7℃,初始 pH7.74, 发 酵 时 间 58.4h), 谷 氨 酸 最 大 产 率 为 37.1kg / m3。 2.3. 发酵生产谷氨酸的代谢途径最新研究
上述各种努力都是为了提高谷氨酸产率,但 根本的措施是搞清细菌的代谢途径。因此,已有人 将研究重点放在了代谢途径工程。 S.Takac[11]等人 在前人工作的基础上尝试了谷氨酸生产最优化的 代谢流分布研究。据称,基于化学计量的流率平衡 模型已经引起了极大兴趣, 但由于文献中关于大 肠杆菌生成氨基酸的代谢系统信息较多, 这种模 型主要用于 E.coli.。 S.Takac 等人提出了一个详细 的谷氨酸生产菌生物反应网络。 并把它用于分析 最优化生产谷氨酸的细胞内流率分布。 流率分布 分析揭示了要增加谷氨酸产率需要通过基因工程 技 术 以 及 发 酵 条 件 来 调 节 的 B.flavum 细 胞 内 的 代谢反应。 理论上,通过磷酸戊糖支路,乙醛酸支 路的中间物以及 A-酮戊二酸的分布对谷氨酸的 生产是至关重要的。 为了增加谷氨酸产率, 有关 α-酮 戊 二 酸 、 丙 酮 酸 和 谷 氨 酸 的 生 物 反 应 都 应 进 行调控。 研究还显示,为了生成谷氨酸,TCA 循环 不需进行完全,倒是乙醛酸支路相当活跃。这好像 和传统的观点不一致。分析还显示,谷氨酸优先利 用谷氨酸脱氢酶由 α-KG 生产,但到发酵后期,由 于反应液中 NH3 浓度的下降,谷氨酸开始利用谷 氨酸合成酶由谷氨酰胺合成。 文中也谈到了溶氧 的问题,此问题他们尚在研究之中。

【目标管理】张砦-遗传算法在多目标优化中的应用

【目标管理】张砦-遗传算法在多目标优化中的应用

(3) 发展期(90年代以后) 90年代,遗传算法不断地向广度和深度发展。
• 1991年,wrence出版《Handbook of Genetic Algorithms》一书,详尽地介绍遗传算法的工作细节。 • 1996年 Z.Michalewicz的专著《遗传算法 + 数据结构 = 进 化程序》深入讨论了遗传算法的各种专门问题。同年,T.Back的 专著《进化算法的理论与实践:进化策略、进化规划、遗传算法》 深入阐明进化算法的许多理论问题。 • 1992年,Koza出版专著《Genetic Programming:on the Programming of Computer by Means of Natural Selection》, 该书全面介绍了遗传规划的原理及应用实例,表明遗传规划己成 为进化算法的一个重要分支。 • 1994年,Koza又出版第二部专著《Genetic Programming Ⅱ:Automatic Discovery of Reusable Programs》,提出自动 定义函数的新概念,在遗传规划中引入子程序的新技术。同年, K.E.Kinnear主编《Advances in Genetic Programming》,汇集 许多研究工作者有关应用遗传规划的经验和技术。
(1) 生物的所有遗传信息都包含在其染色体中,染色体决定了 生物的性状;
(2) 染色体是由基因及其有规律的排列所构成的,遗传和进 化过程发生在染色体上;
(3) 生物的繁殖过程是由其基因的复制过程来完成的; (4) 通过同源染色体之间的交叉或染色体的变异会产生新的 物种,使生物呈现新的性状。 (5) 对环境适应性好的基因或染色体经常比适应性差的基因 或染色体有更多的机会遗传到下一代。
1.1 遗传算法的生物学基础

sap aps

sap aps
programming high
high
SAP AG 2000 Optimization with SAP APO / 15 (APX) 15 Optimization Extension Workbench
Optimization Extension Workbench (APX)
• Introduction: The power of standard optimization
SAP AG 2000 Optimization with SAP APO / 12
Optimization in APO
Use of ILOG Libraries in APO
ILOG Concert Solver
CPLEX
Scheduler
Dispatcher
LP, MILP (ND, SNP)
-> several GigaByte main memory allow an adequate modeling of the supply chain planning problem

generic optimizer
standard software approach should be configurable to solve


in APO:
models offered by APO cover diverse needs for
different industrial applications
algorithms are thoroughly tested to provide the
best-of-breed solution
Introduction: The power of standard optimization

遗传规划GeneticProgramming

遗传规划GeneticProgramming
2个自变量 2 1 0 0
考虑:若函数有偶个自变量则该函数返回“T”否则“NIL”:
( N O T ( D O ) A N D N O T ( D I ) ) O R D O A N D D I
OR
AND
AND
NOT
NOT D0
D1
D0
D1
二、初始群体的生成
a、初始个体生成原理 用随机方法产生所需解决问题的各种符号表达式
长将个体分为两组Ⅰ、Ⅱ。(Ⅰ 组适应度高,Ⅱ组适应
度低 )80﹪的时间在Ⅰ组选择,20﹪的时间在Ⅱ组选
择。
(3)级差选择法:
按适应度将个体分成等级,个体根据级别进行选择。
(4)竞争选择法:
首先从群体中随机选取一组个体,然后从该组中
选出具有较高适应度的个体。
b、交换
交换的实现方法:①在父代群体随机选取两个个体。
子代t 在某一个体i:
适应度:r(i,t) Nc s(i, j)c(j) j1 s ( i , j ) :个体i在适应度计算试例j下的返回值; N c :适应度计算试例数; c ( j ) :适应度试例j的实测值或正确值 ;
b、标准适应度(Standardized Fitness) 适应度最大越好的问题:
(1)随机产生初始群体,即产生众多函数和变量随机组成 的计算机程序。
(2)运行群体中的每一个计算机程序(个体),计算适应 度。
(3)生成下一代群体。
①根据适应度随机复制新一代个体
②通过在双亲个体随机选定的部位进行交换产生新的 个体,双亲个体也依据适应度随机选定。
(4)迭代执行(2)(3)直到终止准则满足为止。
M
n(k,t) 1 i 1
四、基本算子

基于改进的RRT^()-connect算法机械臂路径规划

基于改进的RRT^()-connect算法机械臂路径规划

随着时代的飞速发展,高度自主化的机器人在人类社会中的地位与作用越来越大。

而机械臂作为机器人的一个最主要操作部件,其运动规划问题,例如准确抓取物体,在运动中躲避障碍物等,是现在研究的热点,对其运动规划的不断深入研究是非常必要的。

机械臂的运动规划主要在高维空间中进行。

RRT (Rapidly-exploring Random Tree)算法[1]基于随机采样的规划方式,无需对构型空间的障碍物进行精确描述,同时不需要预处理,因此在高维空间被广为使用。

近些年人们对于RRT算法的研究很多,2000年Kuffner等提出RRT-connect算法[2],通过在起点与终点同时生成两棵随机树,加快了算法的收敛速度,但存在搜索路径步长较长的情况。

2002年Bruce等提出了ERRT(Extend RRT)算法[3]。

2006年Ferguson等提出DRRT (Dynamic RRT)算法[4]。

2011年Karaman和Frazzoli提出改进的RRT*算法[5],在继承传统RRT算法概率完备性的基础上,同时具备了渐进最优性,保证路径较优,但是会增加搜索时间。

2012年Islam等提出快速收敛的RRT*-smart算法[6],利用智能采样和路径优化来迫近最优解,但是路径采样点较少,使得路径棱角较大,不利于实际运用。

2013年Jordan等通过将RRT*算法进行双向搜索,提出B-RRT*算法[7],加快了搜索速度。

同年Salzman等提出在下界树LBT-RRT中连续插值的渐进优化算法[8]。

2015年Qureshi等提出在B-RRT*算法中插入智能函数提高搜索速度的IB-RRT*算法[9]。

同年Klemm等结合RRT*的渐进最优和RRT-connect的双向搜基于改进的RRT*-connect算法机械臂路径规划刘建宇,范平清上海工程技术大学机械与汽车工程学院,上海201620摘要:基于双向渐进最优的RRT*-connect算法,对高维的机械臂运动规划进行分析,从而使规划过程中的搜索路径更短,效率更高。

多目标优化问题的有效Pareto最优集_黄斌

多目标优化问题的有效Pareto最优集_黄斌

总第232期2009年第2期计算机与数字工程Computer&Digital Engineer ingVol.37No.228多目标优化问题的有效Pareto最优集*黄斌陈德礼(莆田学院电子信息工程系莆田351100)摘要多目标优化问题求解是当前演化计算的一个重要研究方向,而基于Pa reto最优概念的遗传算法更是研究的重点,然而,遗传算法在解决多目标优化问题上的缺陷却使得其往往得不到一个令人满意的解。

在对该类算法研究的基础上提出了衡量Par eto最优解集的标准,并对如何满足这个标准提出了建议。

关键词多目标优化Par eto最优演化计算中图分类号TP301.6Effective Pareto Optimal Set of Multi2ObjectiveOptimization ProblemsHuang Bin Chen Deli(Electr onic Inf or mation Engineer ing Depar tment,Putian Univer sity,Putian351100)A bst r act Multi-obje ctive optimization(MOO)is a n im por tant r esea rch a re a of evolutionar y computations in re2 cent year s,and the cur rent r ese arch wor k f ocuses on the Pa reto optim al-based MOO genetic algorithm.However,GA has a def ect on MOO,which alwa ys makes a disillusionary solution.This paper put f or ward a standard for ef fective Par eto optimal set,and some suggest ion on how to ge t it.K e y w ords mult i2objective optim ization,Par eto optimal,e volutionar y computa tionClass Num be r TP301.61多目标优化问题定义1多目标优化问题(MOP)在可行域中确定由决策变量组成的向量,使得一组相互冲突的目标函数值尽量同时达到极小。

two-stage stochastic programming

two-stage stochastic programming

two-stage stochastic programming
两阶段随机规划是一种用于解决决策问题的数学优化方法,其中决策变量在第一阶段被确定,然后在第二阶段面临不确定性。

两阶段随机规划可以应用于各种场景,如生产计划、资源配置、物流和供应链管理等。

在第一阶段,决策者需要做出一系列决策,这些决策基于对未来的预测和期望。

这些决策通常涉及资源的分配、产品的生产计划等。

在第二阶段,决策者面临不确定性,例如市场需求的变化、资源供应的波动等。

这些不确定性可能导致第一阶段的决策无法实现预期的目标,因此决策者需要在第二阶段调整决策以适应这些变化。

两阶段随机规划的目标是在第一阶段做出最优决策,并在第二阶段面对不确定性时保持灵活性。

通过将问题分解为两个阶段,两阶段随机规划可以更好地处理不确定性和风险,并提供更稳健和可靠的解决方案。

在实际应用中,两阶段随机规划可以通过各种优化算法进行求解,如线性规划、整数规划、混合整数规划等。

此外,可以通过引入不同的假设和约束条件来扩展两阶段随机规划的应用范围,例如考虑不同的成本和收益函数、时间限制、资源限制等。

总之,两阶段随机规划是一种强大的数学优化方法,可以帮助决策者做出更稳健和可靠的决策,并在面对不确定性和风险时保持灵活性。

Global Optimization Toolbox Release Notes 全局优化工具箱发行说明

Global Optimization Toolbox Release Notes 全局优化工具箱发行说明

15-3
R2007b
Multiobjective Optimization with Genetic Algorithm . . . . . 16-2
vi Contents
Multiobjective Optimization with Genetic Algorithm and Custom Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Genetic Algorithm Tool and Pattern Search Tool Combined Into Optimization Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-2
New Optimization Tool Support for gamultiobj, simulannealbnd, and threshacceptbnd . . . . . . . . . . . . . . .
New Demos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16-2
16-2 16-2 16-2
R2007a
New Functions for Simulated Annealing and Threshold Acceptance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hybrid Multiobjective Optimization Combining Genetic Algorithm with Optimization Toolbox . . . . . . . . . . . . . . .

基于遗传算法优化的BP神经网络在考研结果预测中的应用

基于遗传算法优化的BP神经网络在考研结果预测中的应用

黑铉语言信麵与电睡China Computer & Communication2021年第1期基于遗传算法优化的B P神经网络在考研结果预测中的应用李驰(四川大学锦城学院计算机科学与软件工程系,四川成都611731)摘要:通过遗传算法先对BP神经网络的初始权值和阈值进行优化后,再将BP神经网络用于考研结果的预测模型中。

实验表明,这种优化后的预测模型因为克服了收敛速度慢、易产生局部最小等缺陷,比单纯使用BP神经网络建立的预测 模型准确度更高。

将这个预测模型用于考研报名之前供学生预测参考,方便学生做出合理的决策,具有一定的实际意义。

关键词:考研;预测;BP神经网络;遗传算法中图分类号:TD712 文献标识码:A文章编号:1003-9767 (2021) 01-038-04Application of BP Neural Network Based on Genetic Algorithms Optimization in Prediction of Postgraduate Entrance ExaminationLI Chi(Department of Computer Science and Software Engineering,Jincheng College of Sichuan University,Chengdu Sichuan611731, China) Abstract:F irs tly,the in itia l weight and threshold of BP neural network are optimized by genetic algorithm,and then BP neural netw ork is used in the pre diction model of the results o f the postgraduate entrance exam ination.The experim ent shows that the optim ized prediction model overcomes the shortcomings o f slow convergence speed and easy to produce local m inim um,so it is more accurate than the pre diction model established by BP neural network alone.This pre diction model can be used as a reference for students to make a reasonable decision before applying fo r postgraduate entrance examination.Key words:postgraduate entrance exam ination;prediction;BP neural network;genetic algorithms〇引言随着社会对于高素质知识型人才的需求越来越迫切,我 国报考研究生的人数呈现逐年大幅増加的趋势。

优化计算方法Word版

优化计算方法Word版

最优化计算方法---遗传算法1 遗传算法的历史简介二十世纪六十年代,I.Rechenberg在他的《演化战略》中第一次引入了进化算法的思想(起初称之为Evolutionsstragegie)。

他的这一思想逐渐被其他一些研究者发展。

遗传算法(Genetic Algorithms)是John Holland发明的,后来他和他的学生及他的同事又不断发展了它。

终于,在1975年John Holland出版了专著《自然系统和人工系统中的自适应》(Adaptation In Natural and Artificial Systems)。

1992年,John Koza曾经使用遗传算法编出新的程序去做一些具体的工作。

他称他的这种方法为“进化规划”(Genetic Programming,简称GP)。

其中使用了LISP规划方法,这是因为这种语言中的程序被表示为“分析树”(Parse Tree),而这种遗传算法就是以这些分析树为对象的。

2 生物学与进化论背景1)基因所有的生物都是由细胞组成的。

在每一个细胞中都有想同序列的染色体。

染色体是一串DNA的片断,它为整个有机体提供了一种复制模式。

染色体是由基因组成的,或者说染色体就是一块块的基因。

每一个基因为一个特定的蛋白质编码。

或者更简单的说,每一个基因为生物体的某一特定特征编码,比如说眼睛的颜色。

所有可能的某一特定特征的属性(比如:蓝色,桔黄色等)被称之为等位基因。

每一个基因在染色体上都有其特定的位置,这个位置一般被称作位点(Locus)。

全部序列的基因物质(或者全部的染色体)称之为基因组(或染色体组)(Genome)。

基因组上特定序列的基因被称作基因型(Genotype)。

基因型和后天的表现型两者是有机体的显性、生理和心理特征。

比如说眼睛的颜色、智力的基础。

2)复制(Reproduction)在复制中,首先发生的是交叉(Crossover)。

来自于父代的基因按照一定的方式组成了新的基因。

遗传算法简介 英文版

遗传算法简介 英文版

Fitness and Selection
Roulette Wheel Competition Etc.
Cross-Over
Replaces two parent
solutions with two children solutions. Mechanism for covering large area of search space.
Stochastic Methods
Heuristic
Using “Rules of Thumb”
Metaheuristic
A framework of heuristics used to updaing a search.
Simulated Annealing Tabu Search Ant Colony Systems
Mutation
Operates on a single
chromosome. Mechanism to improve local search space.
Advantages to using GAs
Flexible and adaptive to a wide variety of
Slow local convergence Perceived learning overhead
Applications
Function optimization Job shop scheduling Process planning Creativity VLSI Traveling Sales Man
Evaluation Function
function [x, soln] = Four_Eval(x, options) soln = -(exp(sin(50*x(1))) + sin(60*exp(x(2))) + sin(70*sin(x(1))) + ... sin(sin(80*x(2))) - sin(10*(x(1)+x(2))) + 0.25*(x(1)^2 + x(2)^2));

笛卡尔遗传规划CartesianGeneticProgramming(CGP)简单理解(1)

笛卡尔遗传规划CartesianGeneticProgramming(CGP)简单理解(1)

笛卡尔遗传规划CartesianGeneticProgramming(CGP)简单理解(1)初识遗传算法Genetic Algorithm(GA) 遗传算法是计算数学中⽤于解决最优化的搜索算法,是进化算法的⼀种。

进化算法借鉴了进化⽣物学中的⼀些现象⽽发展起来的,这些现象包括遗传、突变、⾃然选择以及杂交等,是⼀个通过计算机模拟解决最优化问题的过程,遗传算法从代表问题可能存在的⼀个解集的⼀个种群(population)开始的,⼀个种群由⼀定数量的候选解也称为个体(individual)组成,个体由基因(gene)编码⽽成,基因的表现形式实际上是每个个体上带有的染⾊体(chromosome) 染⾊体即为基因的集合,应⽤遗传算法的⼀般步骤是:1.需要实现表现形到基因型的编码⼯作,常⽤编码⽅法有⼆进制编码、格雷码编码、浮点编码和符号编码。

2.进化从随机个体(初代种群)的种群开始,之后⼀代⼀代进化。

按照优胜劣汰的准则在每⼀代中,评价整个种群的适应度(fitness),从当前种群中选择(selection)多个个体(基于它们的适应度)。

3.借助于⾃然遗传学的遗传算⼦(genetic operators)进⾏组合交叉(crossover)和变异(mutation) 产⽣新的种群,该种群在算法的下⼀次迭代中成为新的种群。

4.在末代种群中的最优个体通过解码(decoding)产⽣最优解笛卡尔遗传规划介绍 笛卡尔遗传规划源⾃ Miller 等⼈对进化数字电路的发展。

1999 年出现了专门研究笛卡尔遗传规划的团队2000 年由 Miller 等⼈发表了Cartesian Genetic Programming,正式提出了笛卡尔遗传规划的⼀般形式。

笛卡尔遗传规划由⼀个带索引节点的有向图表⽰,这是⼀个有n 个输⼊m个输出的有向图(directed graph),其中输⼊节点的下标为 0 到 n-1,输出由最后⼀列的m个节点得到。

城市轨道交通同步协调的优化模型

城市轨道交通同步协调的优化模型

城 市 轨道 交通 同步 协 调 的优 化模 型
曹志超 , 袁振 洲 李得伟
( 北京交通大学城市交通 复杂系统理论与技术教 育部重点 实验 室 ,北京 100044) ( 北京交通大学轨道交通控 制与安全国家重点实验室 ,北京 100044)
摘 要 :为 实现换 乘客 流 的无缝 衔接 和 列车 到发 的 同步协调 ,以总换 乘 等车 时 间最少 为 目标 ,将 列 车行 车 间隔 、客 流 划分 结果和 等 车 时 间作 为 主要输 入条 件 ,构建 了带有 0-1决策 变 量 的城 市轨 道 交通 同步协调优化模 型.运用 改进 的遗传算法与计算机模拟相结合的方法有效解决 了模 型的求 解 问题.最 后 ,以北 京 市城 市轨道 交通 为例 进行 仿 真 ,输 出整 点发 车和 非整 点发 车条 件 下 的 2种 优化方案.结果表明,整点发车方案和非整 点发 车方案 的总换乘等车时间较基础方案分别缩短 2.26% 和 2.48% ,单 个车 站 的最大 换 乘等 车 时间分别 节 省 了7.90% 和 12.87% .该 优化模 型能够 有效 缩短 乘客 的换 乘 等车 时 间,提高城 市轨道 交通 的服 务水 平. 关 键词 :城 市轨道 交通 ;同步协 调 ;列车 时刻表 ;无缝衔 接 ;换乘 等 车 时间 中图分 类号 :U121 文献 标 志码 :A 文章 编 号 :1001—0505(2016)01-0221- 05
urban rail transit(URT)wit}l 0.1 binary decision vai lables was proposed.The objective iS t he m i nimi—
zation of the total trans ̄r waiting time an d the inputs are the running interval,the results of passenger division an d the waiting time.111e model was solved by combining the improved genetic algorithm

torch.optim 牛顿法 -回复

torch.optim 牛顿法 -回复

torch.optim 牛顿法-回复牛顿法(Newton's method)是一种优化算法,用于寻找目标函数的最小值或最大值。

它利用函数的一阶导数和二阶导数信息来确定下降方向和步长,因此被称为二阶优化算法。

牛顿法在数学和计算机科学领域发挥着重要作用,特别是在机器学习和深度学习中。

在PyTorch深度学习框架中,torch.optim模块提供了牛顿法的实现。

本文将一步一步介绍torch.optim牛顿法的原理、应用场景以及使用方法。

一、牛顿法原理牛顿法基于泰勒级数将目标函数在一个初始点x0处进行局部逼近。

这个逼近用到目标函数的一阶导数和二阶导数。

泰勒级数表示如下:f(x) ≈f(x0) + (x - x0)f'(x0) + 1/2(x - x0)^2f''(x0)其中f'(x)是函数f(x)的一阶导数,f''(x)是函数f(x)的二阶导数。

对于一元函数的最小化问题,牛顿法的迭代公式为:x_{n+1} = x_n - f'(x_n)/f''(x_n)其中x_{n+1}是更新后的点,x_n是上一次的点。

迭代过程会一直进行,直到满足停止条件。

对于多元函数的最小化问题,牛顿法需要计算目标函数的Hessian矩阵,即二阶导数矩阵。

迭代公式修改为:x_{n+1} = x_n - H^-1f'(x_n)其中H是Hessian矩阵的逆矩阵。

二、牛顿法应用场景牛顿法在优化问题中具有广泛的应用。

特别是当目标函数具有强凸或凸性质时,牛顿法的收敛速度非常快。

常见的应用场景包括最小二乘法、无约束优化问题、非线性等式约束优化问题等。

牛顿法在处理这些问题时表现出色,常常能够在少数迭代步骤中收敛到最优解。

在深度学习领域,由于目标函数通常是非线性的,而且参数数量巨大,牛顿法的计算量非常大。

因此,牛顿法在深度学习中并不常用,更常见的是基于梯度的优化方法,如随机梯度下降(SGD)。

双疫苗的免疫遗传算法在求解置换Flow-shop问题中的应用

双疫苗的免疫遗传算法在求解置换Flow-shop问题中的应用

双疫苗的免疫遗传算法在求解置换Flow-shop问题中的应用沈建涛;黄宗南【摘要】为提升免疫遗传算法求解置换Flow-shop问题时的寻优速度,提出双疫苗技术.利用NEH算法得到较优排序结果作为前置疫苗,以改善进化初期最佳个体疫苗质量较差的缺陷,提高算法在进化初期的寻优速度;当种群中最佳个体质量更优时,改由该个体作为疫苗进行接种.最后对标准案例进行了测试,结果表明,具有双疫苗技术的算法寻优速度更佳,并能得到更好的作业调度方案.【期刊名称】《机械制造》【年(卷),期】2015(053)008【总页数】3页(P79-81)【关键词】Flow-shop;免疫遗传算法;NEH算法;双疫苗【作者】沈建涛;黄宗南【作者单位】上海大学机电工程与自动化学院上海200072;上海大学机电工程与自动化学院上海200072【正文语种】中文【中图分类】TH165;O224置换流水车间作业调度(Permutation Flow-shop Scheduling Problem,PFSP)是生产调度中的典型问题,其特点是工件的加工路线一致且在每台机器上的加工顺序都相同,属于NP-hard(非确定性多项式)问题。

到目前为止,应用在该问题上的调度优化算法有很多,尤以智能算法应用研究最为广泛,诸如遗传算法、蚁群算法、模拟退火算法、免疫算法和混合算法等。

免疫遗传算法保留了遗传算法的搜索特性,又通过免疫机制在很大程度上避免了早熟收敛,提高算法的寻优速度。

疫苗是免疫遗传算法的关键技术之一,有许多研究者在该方面做了不少研究[1-5],但应用于流水车间作业调度问题的研究还较少。

本文用免疫遗传算法求解PFSP问题,在疫苗方面,提出双疫苗技术,以改善寻优速度和质量。

1 PFSP问题的数学模型式中:σi为在某个加工顺序π=(σ1,σ2,……,σn)中第 i个位置上的工件序号;n为工件总数;m为工序总数(或称机器数);k 为机器序号;tσi,k为工件σi在工序 k上的加工时间;Sσi,k-1为工件σi在机器k上的开工时间;Cσi,m为工件σi的总完工时间。

二维多阶段矩形剪切排样算法

二维多阶段矩形剪切排样算法

二维多阶段矩形剪切排样算法孔令熠;陈秋莲【摘要】讨论有需求约束的二维剪切矩形排样问题:将一张板材剪切成一组已知尺寸的毛坯,使排样价值(板材中包含的毛坯总价值)最大,约束条件是排样方式中包含每种毛坯数量都不能超过其需求量. 采用普通条带多阶段排样方式,每次剪切都从板材上产生一根水平或者竖直的普通条带,条带中可以包含不同尺寸毛坯. 引入分支限界与贪婪策略,以提高算法效率. 实验结果表明,该算法可以有效提高排样价值.%a group of roughcast pieces with given sizes and makes the nesting value ( total value of pieces contained in the plate ) maximised, the constraints are that in the nesting pattern the number of each piece cannot exceed its demand.We adopt general strips and multi-staged nesting pattern, there will produce one horizontal or vertical strip from the plate in every cut, which can contain the pieces of different types. Branch-and-bound and greedy strategy are introduced to enhance the efficiency of the algorithm.Experimental results indicate that the algorithm can effectively improve the value of nesting. We discuss two-dimensional cutting rectangular nesting problem with constrained requirement:in which a single plate is cut into【期刊名称】《计算机应用与软件》【年(卷),期】2015(032)005【总页数】3页(P231-233)【关键词】有约束二维剪切;多阶段排样方式;普通条带;分支限界;贪婪策略【作者】孔令熠;陈秋莲【作者单位】广西大学计算机与电子信息学院广西南宁530004;广西大学计算机与电子信息学院广西南宁530004【正文语种】中文【中图分类】TP391排样问题广泛应用机械制造、电机、航空航天、木材的分割、皮革的剪裁等领域。

数学优化算法在遗传学中的应用

数学优化算法在遗传学中的应用

数学优化算法在遗传学中的应用Applications of Mathematical Optimization Algorithms inGeneticsMathematics, the language of precision and logic, often finds itself intertwined with the complexities of genetics. The realm of genetic studies is vast and dynamic, constantly revealing new mysteries about life's building blocks. Within this ever-expanding field, mathematical optimization algorithms play a pivotal role, aiding researchers in unlocking the secrets hidden within our genes.Genetic research involves analyzing intricate patterns and interactions between DNA sequences to understand their impact on traits and diseases. Here, mathematical optimization comes into play as it helps scientists find optimal solutions amidst these myriad variables. Algorithms like gradient descent or simulated annealing can efficiently navigate through this maze of data, searching for those combinations that yield maximum insight or desired outcomes.One prominent application is in the area of genetic linkage analysis. Researchers aim to identify specific regions of the genome associated with particular diseases or traits. This task becomes exponentially challenging when dealing with thousands of potential markers across multiple chromosomes. Mathematical optimization techniques enable scientists to systematically sift through thisinformation, pinpointing those regions most strongly linked to the trait under study.Furthermore, in recent years there has been increasing interest in applying machine learning methods –many which rely heavily on mathematical optimization –to genomics datasets. These advanced algorithms can learn from historical data to predict future outcomes or classify individuals based on their genomic profiles. By fine-tuning parameters through optimization techniques, these models become increasingly accurate at capturing nuances in genetic variation and its influence on phenotypes. However, it must be noted that while mathematical optimization algorithms provide powerful tools for genetic analysis, they are not without limitations. Complex biological systems often defy simple mathematical formulations, and real-world datasets can be noisy and incomplete. Therefore, successful applications require a deep understanding of both genetics and mathematics – a blend that few possess but offers immense promise for future discoveries in this exciting frontier science.In conclusion, mathematical optimization algorithms have revolutionized our approach towards understanding genetics by providing efficient means for processing complex datasets and identifying meaningful patterns within them. As technology advances and more sophisticated techniques emerge, we look forward to seeing how these mathematical wizards continue to unravel the mysteries encoded within our genes.。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

G.R. Raidl et al. (Eds.): EvoWorkshops 2004, LNCS 3005, pp. 73–83, 2004.© Springer-Verlag Berlin Heidelberg 2004Two-Step Genetic Programming for Optimization ofRNA Common-StructureJin-Wu Nam 1, 2, Je-Gun Joung 1, 2, Y.S. Ahn 4, and Byoung-Tak Zhang1, 2, 31 Graduate Program in Bioinformatics2 Center for Bioinformation Technology (CBIT)3 Biointelligence Laboratory, School of Computer Science and Engineering Seoul National University, Seoul 151-742, Korea4Altenia Corporation, Korea {jwnam, jgjoung, ysahn, btzhang}@bi.snu.ac.krAbstract. We present an algorithm for identifying putative non-coding RNA (ncRNA) using an RCSG (RNA Common-Structural Grammar) and show the effectiveness of the algorithm. The algorithm consists of two steps: structure learning step and sequence learning step. Both steps are based on genetic pro-gramming. Generally, genetic programming has been applied to learning pro-grams automatically, reconstructing networks, and predicting protein secondary structures. In this study, we use genetic programming to optimize structural grammars. The structural grammars can be formulated as rules of tree structure including function variables. They can be learned by genetic programming. We have defined the rules on how structure definition grammars can be encoded into function trees. The performance of the algorithm is demonstrated by the re-sults obtained from the experiments with RCSG of tRNA and 5S small RNA.1 IntroductionThe universe of functionally important non-transcribed RNAs is rapidly expanding but their systematic identifications in genomes remain challenging. A number of non-coding RNA (ncRNA) genome screening approaches have been reported in the lit-erature, including composition, sequence and structure similarity-based and compara-tive genomics [1], [2], [3]. For example, some focus on secondary structure prediction for a single sequence and some aim at finding a global alignment of multiple se-quences. Yet other groups predict the structure based on free energy minimization or make comparative sequence analyses to determine the structure. However, these methods are limited to sets of short sequence or require the structural information of some known sequences [4]. Also these approaches usually use only positive data to learn a model and there are no doubts that these methods have a weak constitution for noise. To overcome these limitations, we have tried to develop a general method. One of these general methods is learning and optimization of RNA common structure [5].Most ncRNAs have low sequence similarity but high structure similarity [6] making itsuitable for using common-structure for identification of putative ncRNAs [4], [7].74 J.-W. Nam et al.Searching for these common-structures is a crucial step to identify similar new func-tional ncRNAs in genome. Recently, some researches have various methods for find-ing common-structure. For instance, hidden Markov models and genetic algorithm have been applied to search common structure [7], [8]. However, these methods have two great limitations: first, they are not at the level to use as applications, and second, they are not based on learning from structure, thus, unable to find distant homologies. On the other hand, some research groups have developed the structural grammar, which is context free type [9], [10], [11]. Using the structural grammar, we can easily and formally express RNA secondary structures. Macke et al. introduced the structure definition language, which is used in RNAMotif [12]. The language can not only easily represent various RNA secondary structure elements such as loop, bulge, stem and mispair, but also include some sequence features such as conserved motif and mismatch number. This structure definition language allows abstraction of the struc-tural pattern into a ‘descriptor’ with a pattern language so that it can give detailed information regarding base pairing, length and motif. In addition, these structural grammar and structure definition language can be applied to learning the common-structure. The results of the learning represent RNA common-structural grammars (RCSGs). There are various approaches to learning the RCSGs but we apply genetic programming, a kind of evolutionary algorithms, with focus on two steps, structural learning step and motif learning step.Genetic programming is an automated method for creating a working genetic pro-gram, which is called individual and generally represented by tree structure, from a high-level problem statement of a problem [13]. A genetic tree consists of elements from a function set and a terminal set. Function symbols appear as internal nodes. Terminal symbols are used to denote actions taken by the program. Ge-netic programming does this by genetically breeding a population of genetic programs using the principles of Darwinian natural selection and biologically inspired opera-tions. Genetic programming uses crossover and mutation as the transformation opera-tors, which can endow variation to genotype, to change candidate solutions into new candidate solutions [13].We optimize the RCSG of fly tRNA and eukaryotic 5S small rRNA using our ap-plication of genetic programming and present the result of evaluation of our method for identifying mouse tRNA and 5S small rRNA.2 Materials and Methods2.1 Genetic Programming to Optimize RNA Common-Structural GrammarTo identify the putative ncRNAs in genome database, we have tried to find common-structure conserved among ncRNAs. We have implemented a program for learning the common-structure, which is devised by genetic programming. We call the pro-gram esRCSG, evolutionary search for RNA Common-Structural Grammar. The algorithm of esRCSG is summarized as a pseudo-code in Figure 1. The algorithmTwo-Step Genetic Programming for Optimization of RNA Common-Structure 75Fig. 1. The pseudo-code for RCSG optimization algorithm. The algorithm consists of two steps: (1) learning RCSG from a set of training data, known miRNA precursors without sequence-related variables (such as “seq” and “mismatch”); (2) Optimizing RCSG which is learned in structural learning by incorporating the sequence-related variables. The "word-wise“ method randomly splits sequences of training data set into 7-mer words. Initialization of this step is accomplished by incorporating the words into RCSGs that are learned in struc-tural learning.consists of two-step; a structural learning which optimizes only tree structure of grammar without sequence and a learning of sequence that specifies RCSGs of structural learning by incorporating a word, fragment of sequence, into the RCSGs. Both steps (1) (2) are implementations (or instances) of genetic programming and both share many of the procedures. The common procedures of (1) and (2) can be described as follows: (a) initialize the population with randomly generated trees; (b) convert all function trees into structural grammars; (c) calculate the fitness, specific-ity, sensitivity, and complexity for all grammars with the positive and negative train-ing data set; (d) evaluate all structural grammars in terms of the sensitivity, specificity76 J.-W. Nam et al.and complexity; (e) using ranking selection, select function trees that will generate offspring (next generation); (f) apply variations, such as mutation and crossover, with the selected function trees; (g) Iterate steps (b) through (f) for the user-defined num-ber of generations. There are two differences between two steps; the structural learn-ing includes a local search procedure; the learning of sequence uses the wordwise method and utilizes the word to initializes the population. In addition, structural learning step uses mutation and crossover as variation operators to change a tree structure but learning of sequence uses only mutation to change the sequence and the number of mismatch without changing tree structure.Individual representation. In order to convert structural grammars into function trees, we have defined the function f1, f2 and root as shown in Figure 2a. These func-tions can be formulated by some expression rules (Figure 2a). Therefore, using the expression rules, we can represent the structural grammars (Figure 2c) as function trees (Figure 2b). Thus, we can use the function trees as individuals of genetic pro-gramming because one of the characteristics of genetic programming is that individu-als are represented by tree structures.To avoid creation of invalid structural grammars, function trees have some con-straints on the order of the function and the terminal node. First, f2 function should not appear consecutively in the same depth of the tree,contiguous f2functions can be considered as only one. Second, f2 function can only appear as terminal node to ter-minate recursive generation of function tree. Finally, variables ‘minlen’ and ‘maxlen’should always come in pair and should not coexist with variable ‘len.’Population initialization. An initial population is randomly created with some con-straints about function tree as described above. The initial population contains various function trees because there is no limitation in the number of nodes and the width of tree. That makes it possible to cover a wide range for searching start point. The broad coverage at start point is one of the major reasons the esRCSG is efficient for search-ing optimal solution.Fitness function. The fitness function (Equation 1) is defined by using specificity, sensitivity and the complexity that are defined at Equation (4).Fitness = spC * Specificity + stC*Sensitivity - Complexity(1)spC + stC = 1(2) Two parameters, spC and stC were added as a way to regulate the effects of speci-ficity and sensitivity. To normalize the fitness, the sum of spC and stC is always 1 (Equation 2). The parameters decide the trade-off between the specificity and the sensitivity on the fitness function.The complexity, which is a negative factor to the fitness function, controls the growing of the tree structures. Without the complexity term, the size of the tree does not converge to a minimum description length where the tree has best efficiency and can grow infinitely in evolving the trees. To overcome that size problem, we makeTwo-Step Genetic Programming for Optimization of RNA Common-Structure 77Fig. 2. (a): Function f1 generates recursively structural grammar, including one helix structure and either f1 or f2 as next deviation. Function f2 only represents ss (single strand), which means loop, bulge and single strand. Both f1 and f2 contain some variables, which measure structural information such as the length of helix (len), the number of pair (mispair) and mis-match, and sequence (seq). (b): A function tree to which can be converted into structural grammar (c): The child nodes of root in (b) conform to the first indentation of (c) and the nodes of second depth conform to the second indentation of (c). One helix that consists of the pair of h5-h3, h means helix, 5 and 3 mean 5’ end and 3’end) (d): Secondary RNA structure is represented by structural grammar (c). H1, H2 and H3 in (c) and (d) are helix structures.j i Comp include the node number and the depth of j th tree on i th generation (Equation3). Equation (4) describes the definition of Complexity of j th tree on i th generation.ji j i j i NodeNum TreeDepth Comp +×=10(3)best i j i j i Comp Comp PS NS Complexity 12)(1−×+=(4)where Complexity is normalized by square of the number of training data set (NS +PS ). NS is the number of negative training data set (see Table 1) and PS is the number of positive training data set (see Table 1). j i Complexity depends on best i Comp 1− which is Comp of the best individual (tree) on (i -1)th generation . That dependency makes the trees have the minimum Comp as the progress of generation. Finally, the size of the best tree on last generation is converged into minimum length. This is the effect of the Occam’s razor [14] of the complexity factor in our fitness function.Variation. The variation operators are applied so that each descendent will have a different tree structure relative to the parents. The first operator to perturb the tree is the mutation. The mutation changes the value of the function variable by a random variable drawn from Poisson distribution. The crossover exchanges each sub-tree in78 J.-W. Nam et al.two parent trees via single-point recombination to generate two new offspring trees.In the crossover, two parent function trees are selected at random from the population and then each single recombination point is selected at random from the each parent.2.2 DatasetWe applied our algorithm to tRNA sequences of Drosophila melanogaster (available at http:// /entrez) and the eukaryotic 5S small rRNA (available at http://rose.man.poznan.pl/5SData/5SRNA.html) as training and test sets (Table 1).Table 1. Training and Test Dataset.Positive DatasetNegative Dataset 50 tRNAs200 sequences; hairpins, pseudo-knots, IRE, consecutive hairpins, miRNA precursors, bulges, internal loops, rRNA,and fragment of mRNATraining set50 rRNAs200 sequences; hairpins, pseudo-knots, IRE, consecutive hairpins, miRNA precursors, bulges, internal loops, tRNA,and fragment of mRNA 100 tRNAs290 sequences, negative data like training set Test set 100 rRNAs 290 sequences, negative data like training setNegative data set consists of some classes; linear sequences that are extracted from mRNA and simple secondary structure elements such as IRE (Iron-responsive ele-ment), pseudo-knots, hairpins, bulges and internal loops. Also negative data set in-cludes ncRNAs except itself. The negative data set is a negative rudder to decide the direction of learning. Hence, the negative data have to be collected carefully and the distribution by the classes of those has to be considered.3 Results3.1 RCSG of 5S Small rRNA by One-Step LearningWe show the result of 5S small rRNA, one of ncRNAs, preceding the result of tRNA.First, we looked into the secondary structure of four 5S small rRNAs within the posi-tive training set, which are eukaryotic 5S small rRNAs. Figure 3 shows the secondaryFig. 3. The secondary structures of eukaryotic 5Ssmall rRNA, these structures were drawn byRNAfold of Vienna package.Two-Step Genetic Programming for Optimization of RNA Common-Structure 79 structure, which are predicted by RNAfold, of the rRNAs. As the results show, the common structure of 5S small rRNAs in training data should have two stem struc-tures.To optimize RCSG of 5S small rRNA, esRCSG uses the positive training set (= 50) and negative training data set (= 200) as Table 1. The optimized RCSG is repre-sented in the Figure 4(a) and is learned using only structural learning step. It is opti mized under the parameters as Figure 4(b) and it is the best solution learned from four trainings (Sensitivity = 0.94;Fig. 4. (a): The optimal RCSG of 5S small rRNA for structural learning; (b): The settingof the parameters and the best values for training.Specificity = 0.98; Fitness = 0.95). However, the values of the variables of the best RCSG have broader range than we expected. It means that the RCSG may be not optimal actually. Thus, to probe the local optima, we have applied the local search, by decreasing the values of the variables gradually.To analyze the change of the best fitness according to the iterations, we measure the best fitness for each generation (Figure 5). Based on the results, the best fitness reaches a saturation point (0.95) at about thirteenth generation. It is because the searching problem of RCSG for 5S small rRNA is not complex. In most experiments, the esRCSG is used to detect best RCSG before the fifteenth generation. Also, be-cause we apply the elitism during the selection, we can guarantee the best fitness in the last generation. The average fitness also is increased during iteration of esRCSG gradually.80 J.-W. Nam et al.On the other hand, we have searched for the RCSG under several different parame-ter conditions by varying spC and stC . We found that the trade-off between specificity and sensitivity was considerably important in searching for the optimum solution, but the population size was not directly related with the problem. In the results, we de-tected the best solution at spC (= 0.95) and stC (= 0.05). To assess the optimized RCSG, we evaluated it with the test set of 5S small rRNAs. The sensitivity was 0.83and the specificity was 0.86 in the results.3.2 RCSG of tRNAs by Two-Step LearningFirst, we looked into the secondary structure of four tRNAs within the positive train-ing set, which are tRNAs of drosophila melanogaster . Figure 6 shows the secondary structure, which are predicted by RNAfold, of the tRNAs. As the results show, the common structure of tRNAs in training data should have two stems instead of three stems. The RCSG of tRNAs, which is learned by two-step genetic programming, is represented as follow. The result shows that the RCSG has conserved two stems,bulge structure and conserved motif as common structure. The conserved motif in-cludes 7 nucleotides, “gaucacu” with allowing 3 mismatches at the loop of second stem loop structure.<Best RCSG for tRNA>descrh5 ( mispair=2 )h5 ( mispair=3 )ss ( len=10 )h3h5 ( mispair=1 )ss ( len=24, seq="gaucacu", mismatch=3 )h3ss (minlen = 2, maxlen = 15)h3Fig. 6. The secondary structures of drosophila melanogaster tRNAs, these structures were drawn by RNAfold of Vienna package. We applied free energy minimum approach with rescale energy parameter to temperature 37°C.Two-Step Genetic Programming for Optimization of RNA Common-Structure 81Through two-step genetic programming, the sensitivity of the optimized RCSG is diminished but the fitness and specificity of the optimized RCSG increased. The re-sults show sequence learning, the second step, is efficient to gain the specificity. High specificity is important in our study because higher specificity produces less false specificity, sensitivity with the test data as mentioned in the table 1. The results show positive and experimental errors. Next, to validate the optimized RCSG, we measured that the specificity and the sensitivity which are presented in the training rarely change in the test. It means that our algorithm is very efficient for identification of putative ncRNA. Figure 7 shows the change of the best fitness, specificity and sensitivity ac-cording to generation in both structural learning step (Figure 7a) and sequence learn-ing step (Figure 7b). Also, we were able to conclude that the specificity converged more quickly than the sensitivity due to high spC (=0.9) parameter.4 ConclusionsIn this study, we suggested an efficient approach for searching of a RCSG with ncRNA sequences using genetic programming, which is of evolutionary computation.With fly tRNA sequences and eukaryotic 5S small rRNAs, our learning algorithm,application of genetic programming, learned distinctive RCSGs. The optimized Fig. 7. The results of two-step genetic programming. (a): The plot of the obtained values during structural learning step, (b): The plot of the obtained values during sequence learningstep, (c): The optimized parameters for two steps genetic programming.<Training>Step1_#Structural Learning resultsFitness = 0.957Specificity = 0.957Sensitivity = 0.931Step2_#Sequence Learning resultsFitness = 0.959Specificity = 0.964Sensitivity = 0.870<Test>Specificity = 0.946Sensitivity = 0.84082 J.-W. Nam et al.RCSGs well reflected common structures of training set and could be applied as common structure model for searching of putative ncRNAs.We can derive two contributions from our new approach. The first contribution is that we have proven the possibility of learning common-structural grammar from structurally unknown sequences through genetic programming. We believe that our approach can be adapted for various applications such as RNA similarity search and putative RNA identification. The second contribution is that we have shown that it is possible to design and generate the RNA structural grammar automatically. Designing of the RNA structural grammar according to the RNA sequences is a difficult prob-lem. Therefore, the system generating structural grammar may be considerably useful. Acknowledgments. This research was supported by the National Research Laboratory (NRL) Program and the Systems Biology (SysBio) Program of the Ministry of Science and Technology and the BK21-IT Program of the Ministry of Education.References1.Rivas E. and Eddy S.R. A dynamic programming algorithm for RNA structure predictionincluding pseudoknots. J. Mol. Biol., (1999) 285:2053-2068.2.Zuker M. On finding all suboptimal foldings of an RNA molecule. Science, (1989)244:48-52.3.Eddy S.R. and Durbin R. RNA sequence analysis using covariance models. Nucleic AcidsResearch, (1994) 22:2079-2088.4.Gorodkin J., Stricklin S.L. and Stormo G.D. Discovering common stem-loop motifs inunaligned RNA sequences. Nucleic Acids Reearch, (2001) 29:2135-2144.5.Perriquet O., Touzet H. and Dauchet M. Finding the common structure shared by twohomolous RNAs. Bioinformatics, (2003) 19:108-116.6.Gary B. Fogel, V. William Porto, Dana G. Weekers, David B. Fogel, Richard H. Griffey,John A. McNeil, Elena Lesnik, David J. Ecker and Rangarajan Sampath. Discovery of RNA structural elements using evolutionary computation. Nucleic Acids Research, (2002) 30:5310-5317.7.Jih-H. Chen, Shu-Yun Le and Jacob V. Maizel. Prediction of common secondary struc-tures of RNAs : a genetic algorithm approach. Nucleic Acids Research, (2000) 28:991-999.8.Sakakibara Y. Pair Hidden Markov models on tree structures Bioinformatics, (2003)19:i232-240.9.Cai L., Malmberg R.L., Wu Y. Stochastic modeling or RNA pseudoknotted structures: agrammatical approach. Bioinformatics, (2003) 19:i66-i73.10.Sakakibara Y., Brwon M., Hughey, R., Mian I.S., Sjolander K., Underwood R.C. andHaussler D. Stochastic context-free grammars for tRNA modeling. Nucleic Acids Re-search, (1994) 22:5112-5120.11.Knudsen B. and Hein J. RNA secondary structure prediction using stochastic context-freegrammars and evolutionary history. Bioinformatics, (1999) 15:446-454.Two-Step Genetic Programming for Optimization of RNA Common-Structure 83 12.Thomas J. Macke, David J. Ecker, Robin R. Gutell, Daniel Gautheret, David A. Case andRangarajan Sampath. RNAMotif, an RNA secondary structure definition and search algo-rithms. Nucleic Acids Research, (2001) 29:4724-4735.13.Koza J.R. Genetic Programming: On the Programming of Computers by Means of Natu-ral Selection. MIT Press, (1992).14.Zhang B.-T., Ohm P., and Mühlenbein H. Evolutionary neural trees for modeling andpredicting complex systems. Engineering Applications of Artificial Intelligence, (1997) 10:473-483.。

相关文档
最新文档