Hypertree Decompositions and Tractable Queries 1

合集下载

科研Mol.Plant:拟南芥蛋白质组图谱的重塑与蛋白质在发育和免疫中共同调节

科研Mol.Plant:拟南芥蛋白质组图谱的重塑与蛋白质在发育和免疫中共同调节

科研Mol.Plant:拟南芥蛋白质组图谱的重塑与蛋白质在发育和免疫中共同调节编译:东方不赢,编辑:Emma、江舜尧。

原创微文,欢迎转发转载。

导读蛋白质组重塑是一种基础适应性反应。

复合体蛋白质和功能性蛋白质通常是共表达的。

研究者使用深度采样策略,将拟南芥组织核心蛋白质组定义为每个组织中约10,000个蛋白质,且在整个植物生命周期中量化近16,000个蛋白质(单细胞的拷贝数)。

整体翻译后修饰蛋白质组学中发现了氨基酸的交换现象,说明了真核生物中可能存在翻译的不忠。

蛋白质丰度的相关性分析揭示了光合作用、种子发育以及衰老和脱落调节中,存在可能组织/年龄特异性转录信号传导模块。

另外,本研究数据表明RD26和其他NAC转录因子具有在种子发育中与耐干旱相关的潜在功能,富含半胱氨酸的受体激酶CRK可以作为衰老中ROS传感器。

核糖体生物发生因子(RBF)复合体的所有组分均以组织、年龄特异的方式共表达,表明拟南芥中这些研究较少的复合体蛋白质组装时功能混杂。

本研究用flg22处理了拟南芥幼苗16小时,分析鉴定了基础免疫反应中蛋白质组结构的特征。

在flg22处理后1、3和16小时,结合平行反应监测(PRM)靶向蛋白质组学,植物激素、氨基酸组分分析和转录本测量获取了一个全面的结果。

研究者发现, JA和JA-Ile水平抑制是通过MYC2(茉莉酸敏感物质1)控制下的IAR3(IAA-ALA抗体3)和JOX2(茉莉酸诱导的加氧酶2)的解共轭和羟基化作用。

这个JA触发的免疫反应调节机制从未被报道,尚未得到充分研究。

本研究生成的数据集广泛覆盖了不同条件中的拟南芥蛋白质组,为相关研究提供了丰富资源。

论文ID原名:Reshaping of the Arabidopsis thaliana proteome landscapeandco-regulation of proteins in development and immunity译名:拟南芥蛋白质组图谱的重塑与蛋白质在发育和免疫中共同调节期刊:Molecular PlantIF:12.084发表时间:2020.09通讯作者:Wolfgang Hoehenwarter通讯作者单位:莱布尼茨植物生物化学研究所实验设计实验结果1. 深度蛋白质组学研究方法因为要对拟南几种组织(组织分别为根,叶,茎生叶,茎,花和角果/种子以及7天至93天衰老的整株植物幼苗)的不同生命阶段进行蛋白质组学测量,及测量用肽flg22处理PTI诱导的蛋白质组。

树木快速支撑英语

树木快速支撑英语

In the realm of arboriculture and landscape management, tree support systems play an essential role in ensuring the health, longevity, and safety of trees. Rapid tree stabilization, especially after planting or following storm damage, is a high-priority task that requires meticulous planning and execution to meet the highest standards of quality. This essay will delve into various aspects of this process, exploring methods, materials, environmental factors, and the importance of professional expertise.**Introduction to Rapid Tree Stabilization**Rapid tree stabilization refers to the immediate and effective reinforcement of newly planted or damaged trees to prevent them from tilting, uprooting, or sustaining further harm. It's a multi-faceted process that involves anchoring, guying, cabling, and pruning, depending on the specific needs of the tree. The objective is to promote recovery and growth while minimizing stress and risk.**Methods and Materials for Rapid Stabilization**Guys and anchors are primary tools used for rapid tree stabilization. Guying involves attaching flexible yet sturdy cables or straps from the upper part of the tree to ground anchors, providing additional support against wind sway. These guys should be installed at a 45-degree angle and checked regularly for tension and potential abrasions. High-quality materials like coated steel cables or synthetic ropes with adjustable tensioners ensure durability and minimal injury to the tree bark.Ground anchors can range from simple stakes driven into the soil to more sophisticated systems like deadman anchors buried deeper for better stability. They must be placed outside the root zone to avoid damage and provide adequate resistance against the pull of the guy wires.For mature or large trees that require extra support due to structural issues or storm damage, bracing and cabling systems may be employed. This involves installing steel rods or cables between major limbs or sections of the trunk to redistribute weight and maintain structural integrity.**Environmental Factors and Their Impact**The effectiveness of rapid tree stabilization strategies is significantly influenced by environmental factors. Soil type, moisture content, and compaction all affect how well anchors hold and how much support they provide. In addition, weather conditions, particularly wind speed and direction, need to be considered when deciding on the placement and tension of guying systems.Moreover, understanding the species-specific characteristics such as growth rate, rooting patterns, and susceptibility to diseases is critical. For instance, certain species may have brittle wood that necessitates gentler guying techniques, while others may respond poorly to overly tight support systems.**Professional Expertise and Importance**Undoubtedly, the key to achieving high-quality rapid tree stabilization lies in professional expertise. Arborists trained in tree biology and mechanics can assess each situation accurately and tailor the stabilization method accordingly. They understand the delicate balance between providing enough support without impeding natural growth or causing unnecessary stress.Moreover, professionals adhere to industry best practices and safety standards, which include regular monitoring and timely adjustments of support systems. As trees grow and their needs change, so should the support provided. Over time, as the root system strengthens and the tree becomes self-supporting, the stabilization apparatus can be gradually removed to allow unhindered development.**Conclusion**Rapid tree stabilization is not just about preventing immediate failure but fostering long-term health and resilience. It's a dynamic process that integrates knowledge of tree biology, physics, and engineering principles. By employing high-quality materials, considering environmental factors, and relying on professional expertise, we can safeguard our urban forests and individual trees from the hazards of instability.Achieving a high standard in rapid tree stabilization ensures that trees not only survive but thrive, contributing to the aesthetic, ecological, andeconomic benefits they bring to our communities. Thus, it is a crucial practice that deserves thoughtful consideration and diligent implementation across all stages of tree care and maintenance.This comprehensive approach, encompassing multiple angles and facets, underscores the importance of treating tree stabilization as a scientific endeavor rather than a mere physical exercise. It calls for a deep understanding of trees and their environment, underpinned by an unwavering commitment to quality and sustainability.。

多基因串联构建进化树的经典文献

多基因串联构建进化树的经典文献

多基因串联构建进化树的经典文献1. Felsenstein, J. (1985). Confidence limits on phylogenies: An approach using the bootstrap. Evolution, 39(4), 783-791.这篇经典文献提出了一种使用bootstrap方法构建进化树并计算置信区间的方法。

作者通过模拟数据集并进行重复抽样,得到了进化树的置信度评估。

2. Nei, M., & Kumar, S. (2000). Molecular evolution and phylogenetics. Oxford university press.这本经典教材详细介绍了使用多基因串联数据构建进化树的方法。

作者解释了不同的进化模型和计算方法,并提供了计算进化树的实例和案例研究。

3. Yang, Z. (2006). Computational molecular evolution. Oxford university press.这本经典教材介绍了使用多基因串联数据进行计算机模拟和进化树构建的方法。

作者详细解释了常用的进化模型、计算方法和统计推断,以及如何评估进化树的可靠性。

4. Rannala, B., & Yang, Z. (1996). Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference. Journal of molecular evolution, 43(3), 304-311.这篇经典文献提出了一种基于贝叶斯统计的方法,用于构建进化树并估计参数。

作者通过模拟数据集,比较了该方法与传统方法的性能,并证明了其在多基因串联数据中的有效性。

5. Wiens, J. J., & Moen, D. S. (2008). Missing data and the accuracy of Bayesian phylogenetics. Journal of Systematics and Evolution, 46(3), 307-314.这篇经典文献探讨了在多基因串联数据中缺失数据的影响,并提出了一种贝叶斯方法来处理缺失数据问题。

单叶蔓荆种质资源遗传多态性的RAPD分析

单叶蔓荆种质资源遗传多态性的RAPD分析
p a t trasi .% s c oe sl t n i a k e sfr4 eoe u i gte C ln e l n 09 ma i u r s oui n d n s o 8 h b fr sn h TAB t xr c o r oe ta tDNA o l r al mp o e c ud g e t i r v y t eyed a d p rt fDNA. n lso h il n u y o i Co cu in:T ee ae c ran g n t i e e c si lk n so u i ,w ih i o c o - h r eti e ei df r n e n a id fq a t r c l l y h c sn ta c r
21 l 第 卷 3 0年 1 8第3 1 月 期
・专家论坛 ・
单叶蔓荆种质资源遗传 多态性 的 R P A D分析
孙 维洋 , 剑 芳 , 进 , 凌 川 李 邱 徐 山东 中医药 大 学药 学 院 , 山东 济南 20 5 53 5
【 要】 摘 目的 : 用 R P 利 A D技术 分析 野生 单 叶蔓 荆 ( ix ro a a s p co a hi. 质 资源 的遗 传 多态性 及 种 Vt il vr i lil a ) e t i L . m i fi C n 种 f 内遗传差异。 方法 : 本研究 以 7 个不 同产区的野生蔓荆 为试 材 , 采用 R P A D技术对其遗传物质进行分析 , 结合N S S 2 0 T Y 一. 1 软件进行遗传相似系数计算和 U G A方法聚类分析 。结果 :0 随机引物共获得 8 PM 1个 8条 D A谱带 ,其中 4 条 N 6 (2 %) 有 多态 性 ; 用 C A 5. 具 3 采 T B法 提取 D A时 , 将 试 材在 0 %的蔗 糖 溶 液 中暗 培养 4 , 利 于提 高 D A提 N 先 . 9 8h 有 N 取率 和纯 度 。 结论 : 种 质之 间均存 在 一定 的遗 传 差异 , 各 与地 域关 系和 主要 有效 成 分 的化学 差 异并 不一致 . 进 一 尚需

【浙江省自然科学基金】_选择_期刊发文热词逐年推荐_20140812

【浙江省自然科学基金】_选择_期刊发文热词逐年推荐_20140812

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106
科研热词 小波变换 鲁棒性 遗传多样性 高磺化环糊精 重金属 选择合并 蚁群优化算法 自适应 聚乙烯醇 综述 特征选择 渗透汽化 最大比合并 尾矿砂 品牌选择 入侵检测 人工免疫系统 五节芒 高斯变异 高密度细胞培养 高压直流输电 马尾松 食性选择 食品添加剂 食品安全 风险管理 风险框架 风险对策 预后 顶空固相微萃取 靶向治疗 隶属度函数 随机圆检测 限流电阻 阴性选择 阳极头 防渗墙 间歇生产过程 错误概率 铝胁迫 重金属形态 重叠度 配色方案 配煤 遥感 遗传算法 选择性集成 选择性识别 选择 连通度 连续生产过程 连续投影算法
2009年 序号 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
金华 重金属污染 重金属 酶免疫法 酶免疫分析 酵母菌 配置空间搜索 配置序列 配光性能 部分可重构系统 遗传转化系统 遗传多样性 选择加氢 选择催化还原 逆向工程 适宜性评价 远程用户 远交杂种 运动历史图像 过点分配 过渡金属 转座子 转基因 超分子组装 赌盘选择 资源配置 评价方法 评价指标 评价和选择方法 证据推理 认证方案 计算机应用 视差估计 表面抗原 行动序列 行为 蛋白质测序 蚁群系统算法(acsa) 蚁群算法 虚拟轴 虚拟水 虚拟协同服务池 虚拟主体 花(鱼骨) 舱气状态 自适应遗传算法 自适应算法 自适应微粒群算法 自适应局部学习 自组装共混 脉宽调制 能量高效 能耗平衡 胺类手性药物

木犀草素预防高脂介导的小鼠胸主动脉血管内皮依赖性舒张功能损害

木犀草素预防高脂介导的小鼠胸主动脉血管内皮依赖性舒张功能损害

L t ol r v n s廿 I p i do MieAo t ue i nP e e t 埒 m ar e f c r aEn i t e l m De e d n V s e a a i nI d c d b i hF t e c hl o u p n e t a or lx t o n u e yH g a Dit
压 水 平显著高 于j 常饮食对 照组 及高脂 饮食+ 犀草 索组 ,P<O0 。② 长期商脂 膳食损 害小 鼠胸主 动脉 的血管 内皮依赖 性舒 张功 能,P . ;而 l 木 .5 <0 5 0 木犀 草素 预防性膳 食干 预可 防 I血管 内皮依赖 性舒 张功能 的损害 ③ 木犀 革素孵 茸可改 善高脂 小鼠胸主 动脉 的内皮 依赖 性舒 张功能 , 陔作用 可被 但 PARY的特 异性 阻断剂 GW96 P 62阻断 。结论 木犀 草索可预 防高 脂膳食介 导的 小 鼠胸 主动脉血 管 内皮 功能损 害,该作 用 可能 与激活 PARY 关 。 P 有
文章 编号 : 17・79 (0 1 2 ・160 622 7 2 1)-20 4-3 关键 词 :木犀 草素 ; 高脂膳 食 ;血 管 内皮 依 赖性 舒张 功 能;P A P RY d i 1. 6/i n17-7 9 0 1 213 o: 03 9 .s.622 7 . 1. . 9 js 2 2 0
Meh d 3 c eedvd d i o t e ru s ten r l h w g o p(omie,h i t i r u 】 c) tehg a de+ %L t l to s 0mi w r iie t reg o p ,h oma c o r u 1 c ) teh h f e go p(O mi ,h i ft it2 ue i e n h g ad t e h on

厚朴提取物对MPTP诱导帕金森病模型小鼠行为学的影响

厚朴提取物对MPTP诱导帕金森病模型小鼠行为学的影响

碍, 表明采用 MP P注射法造模成 功。而厚 朴提取物能够 显 T
著缩短小 鼠爬 杆 时 间 并 能 增 加 小 鼠 悬 挂 能 力评 分 ( P<
0 叭 ) 表明厚朴提取物能提高肢体 运动协调 能力 , MP1 . ; 对 1 P 诱导所致的 P D模 型小 鼠 P D运 动体 征具 有 明显的 改善 作 用 。为 P D中医药防治及其 新药研发提供 了新 的思路 。
国临床神经外科杂志 ,0 5 1 4 :5 . 2 0 ,0( ) 17 [ ] 姚庆 和 , 5 高国栋 . T MP P的毒性机 制与帕金森病 动物模型 .国
本病属于 中医“ 证” 颤 范畴 , 临床 上 以震颤 、 强直 、 肌 运 动迟缓 、 姿势步态异 常为主要特 征 , 以往 中 医药 治疗 多注重 滋补肝 肾 I] 9。导师吴正治教授认为 : 本病 的病 位主要在肌
与注射前 、 正常组 比较 ▲P< . 5 ▲▲P< . 1与模 型组 比较 00 , 0O ;
△ P <0. 5 0
参考文献
[ ] 吴正治 . 1 老年神经病学 . 北京 : 学苑出版社 , 0 :. 2 41 0
[ 2] 吴正治 , n r .. u n , A de C J H ag 高小威 , .安颤灵对 帕金森病模 w 等 型旋转行为及 U S功能影响 的研 究 .世界 中西 医结 合杂志 , P
据资料 以( s表达 , ± ) 采用 t 检验处理。 3 1 一般行为观察 . P D模 型组 小 鼠于第 1次注药后 1 O一
hdoyiiehdohoi , 一甲 基 一 yrpr n y r lr e 1 d c d 4一苯 基 一1 2 3 6一 ,,,
四氢吡啶盐酸盐 ): 美国 Sg 公司产品 , i ma 北京 S ma i 公司代 g 理总部出售 。

FastSLAM A factored solution to the simultaneous localization and mapping problem

FastSLAM A factored solution to the simultaneous localization and mapping problem

FastSLAM:A Factored Solution to the Simultaneous Localization and Mapping ProblemMichael Montemerlo and Sebastian Thrun School of Computer ScienceCarnegie Mellon UniversityPittsburgh,PA15213mmde@,thrun@ Daphne Koller and Ben Wegbreit Computer Science DepartmentStanford UniversityStanford,CA94305-9010koller@,ben@AbstractThe ability to simultaneously localize a robot and ac-curately map its surroundings is considered by many tobe a key prerequisite of truly autonomous robots.How-ever,few approaches to this problem scale up to handlethe very large number of landmarks present in real envi-ronments.Kalmanfilter-based algorithms,for example,require time quadratic in the number of landmarks to in-corporate each sensor observation.This paper presentsFastSLAM,an algorithm that recursively estimates thefull posterior distribution over robot pose and landmarklocations,yet scales logarithmically with the number oflandmarks in the map.This algorithm is based on an ex-act factorization of the posterior into a product of con-ditional landmark distributions and a distribution overrobot paths.The algorithm has been run successfullyon as many as50,000landmarks,environments far be-yond the reach of previous approaches.Experimentalresults demonstrate the advantages and limitations ofthe FastSLAM algorithm on both simulated and real-world data.IntroductionThe problem of simultaneous localization and mapping,also known as SLAM,has attracted immense attention in the mo-bile robotics literature.SLAM addresses the problem of building a map of an environment from a sequence of land-mark measurements obtained from a moving robot.Since robot motion is subject to error,the mapping problem neces-sarily induces a robot localization problem—hence the name SLAM.The ability to simultaneously localize a robot and accurately map its environment is considered by many to be a key prerequisite of truly autonomous robots[3,7,16]. The dominant approach to the SLAM problem was in-troduced in a seminal paper by Smith,Self,and Cheese-man[15].This paper proposed the use of the extended Kalmanfilter(EKF)for incrementally estimating the poste-rior distribution over robot pose along with the positions of the landmarks.In the last decade,this approach has found widespread acceptance infield robotics,as a recent tutorial paper[2]documents.Recent research has focused on scal-ing this approach to larger environments with more than aFigure1:The SLAM problem:The robot moves from pose through a sequence of controls,.As it moves,it observes nearby landmarks.At time,it observes landmark out of two landmarks,.The measurement is denoted (range and bearing).At time,it observes the other landmark, ,and at time,it observes again.The SLAM problem is concerned with estimating the locations of the landmarks and the robot’s path from the controls and the measurements.The gray shading illustrates a conditional independence relation.plementation of this idea leads to an algorithm that requires time,where is the number of particles in the particlefilter and is the number of landmarks.We de-velop a tree-based data structure that reduces the running time of FastSLAM to,making it significantly faster than existing EKF-based SLAM algorithms.We also extend the FastSLAM algorithm to situations with unknown data association and unknown number of landmarks,show-ing that our approach can be extended to the full range of SLAM problems discussed in the literature. Experimental results using a physical robot and a robot simulator illustrate that the FastSLAM algorithm can han-dle orders of magnitude more landmarks than present day approaches.We alsofind that in certain situations,an in-creased number of landmarks leads to a mild reduction of the number of particles needed to generate accurate maps—whereas in others the number of particles required for accurate mapping may be prohibitively large.SLAM Problem DefinitionThe SLAM problem,as defined in the rich body of litera-ture on SLAM,is best described as a probabilistic Markov chain.The robot’s pose at time will be denoted.For robots operating in the plane—which is the case in all of our experiments—poses are comprised of a robot’s-coordi-nate in the plane and its heading direction.Poses evolve according to a probabilistic law,often re-ferred to as the motion model:(1) Thus,is a probabilistic function of the robot control and the previous pose.In mobile robotics,the motion model is usually a time-invariant probabilistic generalization of robot kinematics[1].The robot’s environment possesses immobile land-marks.Each landmark is characterized by its location in space,denoted for.Without loss of gen-erality,we will think of landmarks as points in the plane,so that locations are specified by two numerical values.To map its environment,the robot can sense landmarks. For example,it may be able to measure range and bearing to a landmark,relative to its local coordinate frame.The mea-surement at time will be denoted.While robots can often sense more than one landmark at a time,we follow com-monplace notation by assuming that sensor measurements correspond to exactly one landmark[2].This convention is adopted solely for mathematical convenience.It poses no restriction,as multiple landmark sightings at a single time step can be processed sequentially.Sensor measurements are governed by a probabilistic law, often referred to as the measurement model:(2) Here is the set of all landmarks,andis the index of the landmark perceived at time.For example,in Figure1,we have, and,since the robotfirst observes landmark, then landmark,andfinally landmark for a second time. Many measurement models in the literature assume that the robot can measure range and bearing to landmarks,con-founded by measurement noise.The variable is often referred to as correspondence.Most theoretical work in the literature assumes knowledge of the correspondence or,put differently,that landmarks are uniquely identifiable.Practi-cal implementations use maximum likelihood estimators for estimating the correspondence on-the-fly,which work well if landmarks are spaced sufficiently far apart.In large parts of this paper we will simply assume that landmarks are iden-tifiable,but we will also discuss an extension that estimates the correspondences from data.We are now ready to formulate the SLAM problem.Most generally,SLAM is the problem of determining the location of all landmarks and robot poses from measurementsand controls.In probabilis-tic terms,this is expressed by the posterior, where we use the superscript to refer to a set of variables from time1to time.If the correspondences are known,the SLAM problem is simpler:(3) As discussed in the introduction,all individual landmark es-timation problems are independent if one knew the robot’s path and the correspondence variables.This condi-tional independence is the basis of the FastSLAM algorithm described in the next section.FastSLAM with Known Correspondences We begin our consideration with the important case where the correspondences are known,and so is the number of landmarks observed thus far.Factored RepresentationThe conditional independence property of the SLAM prob-lem implies that the posterior(3)can be factored as follows:(4)Put verbally,the problem can be decomposed into esti-mation problems,one problem of estimating a posterior over robot paths,and problems of estimating the locationsof the landmarks conditioned on the path estimate.This factorization is exact and always applicable in the SLAM problem,as previously argued in[12].The FastSLAM algorithm implements the path estimatorusing a modified particlefilter[4].As we argue further below,thisfilter can sample efficiently from this space,providing a good approximation of the poste-rior even under non-linear motion kinematics.The land-mark pose estimators are realized by Kalmanfilters,using separatefilters for different landmarks. Because the landmark estimates are conditioned on the path estimate,each particle in the particlefilter has its own,lo-cal landmark estimates.Thus,for particles and land-marks,there will be a total of Kalmanfilters,each of dimension2(for the two landmark coordinates).This repre-sentation will now be discussed in detail.Particle Filter Path EstimationFastSLAM employs a particlefilter for estimating the path posterior in(4),using afilter that is similar (but not identical)to the Monte Carlo localization(MCL) algorithm[1].MCL is an application of particlefilter tothe problem of robot pose estimation(localization).At each point in time,both algorithms maintain a set of particles rep-resenting the posterior,denoted.Each particle represents a“guess”of the robot’s path:(5) We use the superscript notation to refer to the-th par-ticle in the set.The particle set is calculated incrementally,from theset at time,a robot control,and a measurement.First,each particle in is used to generate a probabilistic guess of the robot’s pose at time:(6) obtained by sampling from the probabilistic motion model. This estimate is then added to a temporary set of parti-cles,along with the path.Under the assumption that the set of particles in is distributed according to(which is an asymptotically cor-rect approximation),the new particle is distributed accord-ing to.This distribution is commonly referred to as the proposal distribution of particlefiltering. After generating particles in this way,the new set is obtained by sampling from the temporary particle set.Each particle is drawn(with replacement)with a probability proportional to a so-called importance factor,which is calculated as follows[10]:target distribution(7) The exact calculation of(7)will be discussed further below. The resulting sample set is distributed according to an ap-proximation to the desired pose posterior,an approximation which is correct as the number of particlesgoes to infinity.We also notice that only the most recent robot pose estimate is used when generating the parti-cle set.This will allows us to silently“forget”all other pose estimates,rendering the size of each particle indepen-dent of the time index.Landmark Location EstimationFastSLAM represents the conditional landmark estimatesin(4)by Kalmanfilters.Since this estimate is conditioned on the robot pose,the Kalmanfilters are attached to individual pose particles in.More specifi-cally,the full posterior over paths and landmark positions in the FastSLAM algorithm is represented by the sample set(8) Here and are mean and covariance of the Gaus-sian representing the-th landmark,attached to the-th particle.In the planar robot navigation scenario,each mean is a two-element vector,and is a2by2matrix. The posterior over the-th landmark pose is easily ob-tained.Its computation depends on whether or not, that is,whether or not was observed at time.For, we obtain(9)For,we simply leave the Gaussian unchanged:(10) The FastSLAM algorithm implements the update equation (9)using the extended Kalmanfilter(EKF).As in existing EKF approaches to SLAM,thisfilter uses a linearized ver-sion of the perceptual model[2].Thus, FastSLAM’s EKF is similar to the traditional EKF for SLAM[15]in that it approximates the measurement model using a linear Gaussian function.We note that,with a lin-ear Gaussian observation model,the resulting distributionis exactly a Gaussian,even if the mo-tion model is not linear.This is a consequence of the use of sampling to approximate the distribution over the robot’s pose.One significant difference between the FastSLAM algo-rithm’s use of Kalmanfilters and that of the traditional SLAM algorithm is that the updates in the FastSLAM algo-rithm involve only a Gaussian of dimension two(for the two landmark location parameters),whereas in the EKF-based SLAM approach a Gaussian of size has to be updated (with landmarks and3robot pose parameters).This cal-culation can be done in constant time in FastSLAM,whereas it requires time quadratic in in standard SLAM. Calculating the Importance WeightsLet us now return to the problem of calculating the impor-tance weights needed for particlefilter resampling,as defined in(7):µµµµµµµµ8,Σ87,Σ76,Σ65,Σ54,Σ43,Σ32,Σ21,Σ1[m][m][m][m][m][m][m][m][m][m][m][m][m][m][m][m]Figure 2:A tree representinglandmark estimates within asingle particle.(a)(b)(c)Figure4:(a)Physical robot mapping rocks,in a testbed developed for Mars Rover research.(b)Raw range and path data.(c)Map generated using FastSLAM(dots),and locations of rocks determined manually(circles).in the map.It also has to determine if a measurement cor-responds to a new,previously unseen landmark,in whichcase the map should be augmented accordingly.In most existing SLAM solutions based on EKFs,theseproblems are solved via maximum likelihood.More specif-ically,the probability of a data association is given by(12)The step labeled“PF”uses the particlefilter approxima-tion to the posterior.Thefinal step assumesa uniform prior,which is commonly used[2].The maximum likelihood data association is simply the in-dex that maximizes(12).If the maximum value of—with careful consideration of all constantsin(12)—is below a threshold,the landmark is consideredpreviously unseen and the map is augmented accordingly.In FastSLAM,the data association is estimated on a per-particle basis:.As a result,different particles may rely on different values of.Theymight even possess different numbers of landmarks in theirrespective maps.This constitutes a primary difference toEKF approaches,which determine the data association onlyonce for each sensor measurement.It has been observedfrequently that false data association will make the conven-tional EKF approach fail catastrophically[2].FastSLAM ismore likely to recover,thanks to its ability to pursue multi-ple data associations simultaneously.Particles with wrongdata association are(in expectation)more likely to disap-pear in the resampling process than those that guess the dataassociation correctly.We believe that,under mild assumptions(e.g.,minimumspacing between landmarks and bounded sensor error),thedata association search can be implemented in time loga-rithmic in.One possibility is the use of kd-trees as anindexing scheme in the tree structures above,instead of thelandmark number,as proposed in[11].Experimental ResultsThe FastSLAM algorithm was tested extensively under vari-ous conditions.Real-world experiments were complimentedby systematic simulation experiments,to investigate thescaling abilities of the approach.Overall,the results indicatefavorably scaling to large number of landmarks and smallparticle sets.Afixed number of particles(e.g.,)appears to work well across a large number of situations.Figure4a shows the physical robot testbed,which consistsof a small arena set up under NASA funding for Mars Roverresearch.A Pioneer robot equipped with a SICK laser rangefinder was driven along an approximate straight line,gener-ating the raw data shown in Figure4b.The resulting mapgenerated with samples is depicted in Figure4c,with manually determined landmark locations marked bycircles.The robot’s estimates are indicated by x’s,illustrat-ing the high accuracy of the resulting maps.FastSLAM re-sulted in an average residual map error of8.3centimeters,when compared to the manually generated map.Unfortunately,the physical testbed does not allow for sys-tematic experiments regarding the scaling properties of theapproach.In extensive simulations,the number of land-marks was increased up to a total of50,000,which Fast-SLAM successfully mapped with as few as100particles.Here,the number of parameters in FastSLAM is approx-imately0.3%of that in the conventional EKF.Maps with50,000landmarks are entirely out of range for conventionalSLAM techniques,due to their enormous computationalcomplexity.Figure5shows example maps with smallernumbers of landmarks,for different maximum sensor rangesas indicated.The ellipses in Figure5visualize the residualuncertainty when integrated over all particles and Gaussians.In a set of experiments specifically aimed to elucidate thescaling properties of the approach,we evaluated the map androbot pose errors as a function of the number of landmarks,and the number of particles,respectively.The resultsare graphically depicted in Figure6.Figure6a illustratesthat an increase in the number of landmarks mildly re-duces the error in the map and the robot pose.This is be-cause the larger the number of landmarks,the smaller therobot pose error at any point in time.Increasing the numberof particles also bears a positive effect on the map andpose errors,as illustrated in Figure6b.In both diagrams,thebars correspond to95%confidence intervals.Figure5:Maps and estimated robot path,generated using sensors with(a)large and(b)small perceptualfields.The correct landmark locations are shown as dots,and the estimates as ellipses,whose sizes correspond to the residual uncertainty.ConclusionWe presented the FastSLAM algorithm,an efficient new so-lution to the concurrent mapping and localization problem. This algorithm utilizes a Rao-Blackwellized representation of the posterior,integrating particlefilter and Kalmanfilter representations.Similar to Murphy’s work[12],FastSLAM is based on an inherent conditional independence property of the SLAM problem.However,Murphy’s approach main-tains maps using grid positions with discrete values,and therefore scales poorly with the size of the map.His ap-proach also did not deal with the data association problem, which does not arise in the grid-based setting.In FastSLAM,landmark estimates are efficiently repre-sented using tree structures.Updating the posterior requires time,where is the number of particles and the number of landmarks.This is in contrast to the complexity of the common Kalman-filter based ap-proach to SLAM.Experimental results illustrate that Fast-SLAM can build maps with orders of magnitude more land-marks than previous methods.They also demonstrate that under certain conditions,a small number of particles works well regardless of the number of landmarks. Acknowledgments We thank Kevin Murphy and Nando de Freitas for insightful discussions on this topic.This research was sponsored by DARPA’s MARS Program(Contract number N66001-01-C-6018)and the National Science Foundation(CA-REER grant number IIS-9876136and regular grant number IIS-9877033).We thank the Hertz Foundation for their support of Michael Montemerlo’s graduate research.Daphne Koller was supported by the Office of Naval Research,Young Investigator (PECASE)grant N00014-99-1-0464.This work was done while Sebastian Thrun was visiting Stanford University.References[1] F.Dellaert,D.Fox,W.Burgard,and S.Thrun.Monte Carlolocalization for mobile robots.ICRA-99.[2]G.Dissanayake,P.Newman,S.Clark,H.F.Durrant-Whyte,and M.Csorba.An experimental and theoretical investigation into simultaneous localisation and map building(SLAM).Lecture Notes in Control and Information Sciences:Exper-imental Robotics VI,Springer,2000.[3]G.Dissanayake,P.Newman,S.Clark,H.F.Durrant-Whyte,and M.Csorba.A solution to the simultaneous localisation and map building(SLAM)problem.IEEE Transactions of Robotics and Automation,2001.[4] A.Doucet,J.F.G.de Freitas,and N.J.Gordon,editors.Se-quential Monte Carlo Methods In Practice.Springer,2001.(a)(b)Figure6:Accuracy of the FastSLAM algorithm as a function of (a)the number of landmarks,and(b)the number of particles .Large number of landmarks reduce the robot localization error, with little effect on the map error.Good results can be achieved with as few as100particles.[5]A Doucet,N.de Freitas,K.Murphy,and S.Russell.Rao-Blackwellised particlefiltering for dynamic Bayesian net-works.UAI-2000.[6]J.Guivant and E.Nebot.Optimization of the simultaneouslocalization and map building algorithm for real time imple-mentation.IEEE Transaction of Robotic and Automation, May2001.[7] D.Kortenkamp,R.P.Bonasso,and R.Murphy,editors.AI-based Mobile Robots:Case studies of successful robot sys-tems,MIT Press,1998.[8]J.J.Leonard and H.J.S.Feder.A computationally efficientmethod for large-scale concurrent mapping and localization.ISRR-99.[9] F.Lu and ios.Globally consistent range scan alignmentfor environment mapping.Autonomous Robots,4,1997. [10]N.Metropolis, A.W.Rosenbluth,M.N.Rosenbluth, A.H.Teller,and E.Teller.Equations of state calculations by fast computing machine.Journal of Chemical Physics,21,1953.[11] A.W.Moore.Very fast EM-based mixture model clusteringusing multiresolution kd-trees.NIPS-98.[12]K.Murphy.Bayesian map learning in dynamic environments.NIPS-99.[13]K.Murphy and S.Russell.Rao-blackwellized particlefil-tering for dynamic bayesian networks.In Sequential Monte Carlo Methods in Practice,Springer,2001.[14]P.Newman.On the Structure and Solution of the Simulta-neous Localisation and Map Building Problem.PhD thesis, Univ.of Sydney,2000.[15]R.Smith,M.Self,and P.Cheeseman.Estimating uncertainspatial relationships in robotics.In Autonomous Robot Vehni-cles,Springer,1990.[16] C.Thorpe and H.Durrant-Whyte.Field robots.ISRR-2001.[17]S.Thrun,D.Fox,and W.Burgard.A probabilistic approachto concurrent mapping and localization for mobile robots.Machine Learning,31,1998.。

基于混合效应模型及EBLUP预测美国黄松林分优势木树高生长过程

基于混合效应模型及EBLUP预测美国黄松林分优势木树高生长过程
3 .B r i t i s h Co l u mb i a Mi n i s t r y o f F o r e s t s , L a n d s a n d Na t u r a l Re s o u r c e s Op e r a t i o n s , F o r e s t An a l y s i s a n d I n v e n t o r y Br a n c h,
合 为 估 计 混 合 参 数 模 型 确 定 参 数 的过 程 , 而 第 二 阶段 则 是 在 第 一 阶段 拟 合 结 果 的基 础 上 , 依 据 一 个 特 定 林 分 的 若 干树高观测值用 E B L U P法 预 测 此 林 分 的随 机 效 应 值 , 并 进 一 步 预 测树 高生 长 过 程年 3 月 d o i :1 0. 1 1 7 0 7 / j . 1 0 01 — 7 4 8 8. 2 0 1 5 0 3 0 4




Vo 1 . 51, No . 3
S CI ENTI A
S I LVAE
S I NI CAE
Z u Xi a o f e n g ‘ N i C h e n g c a i Go r d e n N i g h Q i n Xi a n l i n
( 1 .I n s t i t u t e o fF o r e s t R e s o u r c e s I n f o r m a t i o n T e c h n i q u e s ,C A F B e i j i n g 1 0 0 0 9 1 ; 2 .C o l l e g e fF o o r e s t r y , B e i h u a U n i v e r s i t y J i l i n 1 3 2 0 1 3;

哈佛大学新研究揭示海绵基因组传达遗传复杂性的出现

哈佛大学新研究揭示海绵基因组传达遗传复杂性的出现

哈佛大学新研究揭示海绵基因组传达遗传复杂性的出现哈佛大学最近研究了四个纲八个种海绵的转录组,专门寻找与动物复杂性相关的基因和途径,成果发表在Mol Biol Evol上。

哈佛大学最近研究了四个纲(Hexactinellida, Demospongiae, Homoscleromorpha and Calcarea)八个种海绵的转录组,专门寻找与动物复杂性相关的基因和途径,成果发表在Mol Biol Evol上。

海绵(多孔动物)是最早进化的动物,它滤食性身体计划是由复杂的含水系统组成的环细胞室组成的,在后生动物中非常独特。

它表示海绵与其他动物在肌肉和神经功能进化之前早有分歧,或表示海绵已失去这些特征。

Amphimedon和Oscarella基因组的分析支持这一观点——许多后生动物的关键基因在所研究的海绵中的是不存在的,但其他海绵中这些基因的存在是未知的。

哈佛大学最近研究了四个纲(Hexactinellida, Demospongiae, Homoscleromorpha and Calcarea)八个种海绵的转录组,专门寻找与动物复杂性相关的基因和途径,成果发表在Mol Biol Evol上。

他们在三种单细胞后鞭毛生物和两种两侧对称动物类群的转录组和基因组中寻找这些基因作为参考。

他们的分析表明,所有海绵纲与其他后生动物共享补充基因。

该团队发现Hexactinellid, Calcareous and Homoscleromorph三种海绵与非两侧对称动物相比共享给两侧对称动物更多的基因(由联川生物提供poly(A)RNA测序服务)。

他们还发现大多数分子代表参与细胞与细胞间的通信,发出信号,活跃在复杂的上皮细胞中,免疫识别和生殖系/性别,只有少数潜在的关键分子没有参与。

一个值得注意的发现是,所有寻常海绵纲(转录组和Amphimedon基因组)某些重要基因的缺失可能反映了主干谱系包括Hexactinellid, Calcareous and Homoscleromorph的分歧。

用Phylomatic建立基于APGIII骨架的进化树 英文手册

用Phylomatic建立基于APGIII骨架的进化树 英文手册

P H Y L O C O MS OFTWARE FOR THE A NALYSIS OFP HYLOGENETIC C OMMUNITY S TRUCTURE ANDC HARACTER E VOLUTION(WITH P HYLOMATIC AND E COVOLVE)U SER’S M ANUALV ERSION4.2c 2011C AMPBELL W EBB,D AVID A CKERLY,S TEVE K EMBELC AM W EBBA RNOLD A RBORETUM OF H ARVARD U NIVERSITYCWEBB@D AVID A CKERLYU NIVERSITY OF C ALIFORNIA,B ERKELEYDACKERLY@S TEVE K EMBELU NIVERSITY OF O REGONSKEMBEL@C ONTENTS1Introduction51.1New in Version4.2 (5)1.2New in Version4.1 (5)1.3New in Version4.0 (5)2Installation62.1Mac OS X (6)2.2Linux/Other Unix (6)2.3Windows (6)3Inputfile formats73.1Tree preparation (8)3.1.1Newick and NEXUS (9)3.2Sample preparation (9)3.3Traits preparation (9)4Using PHYLOCOM10 5Basic data extraction and manipulation115.1AGENODE (11)5.2AGETERM (11)5.3BLADJ (11)5.4CLEANPHY (12)5.5COMNODE (12)5.6MAKENEX (13)5.7NEW2NEX (13)5.8NEW2FY (13)5.9PHYDIST (14)5.10PHYVAR (14)5.11NAF (14)5.12RNDPRUNE (14)5.13SAMPLEPRUNE (14)5.14VERSION (14)6Phylogenetic community structure146.1Phylogenetic community structure metrics (14)6.1.1PD (14)6.1.2COMSTRUCT (15)6.1.3Null models (15)6.1.4SWAP (16)6.1.5LTT and LTTR (17)6.1.6NODESIG (17)6.1.7nodesigl (17)6.2Inter-sample phylogenetic distance (17)6.2.1COMDIST and COMDISTNT (17)6.2.2ICOMDIST (18)6.2.3RAO (18)7Trait-based community strucutre:COMTRAIT19 8P HYLOMATIC218.1The taxafile (21)8.2Branch lengths (22)9E COVOLVE22 10Trait Analyses(by David Ackerly)2310.1Trait means and variance by node:tip-based and node-based methods (23)10.2The Contribution Index:Node-based partitioning of trait variance (25)10.3Phylogenetic independent contrasts (26)10.4Branch lengths (27)10.5Significance testing (28)10.5.1Significance of independent contrasts (28)10.6Phylogenetic signal (28)10.7Running trait analyses (29)10.7.1Switches (29)10.8Output format (29)10.8.1Output table1:Trait conservatism by node(aotf only) (30)10.8.2Output table2:Independent contrasts by node(aotf only) (31)10.8.3Output table3:Trait conservatism—treewise results (32)10.8.4Output table4:Independent contrast correlations (32)11Afterword33 12Citing PHYLOCOM33 13Acknowledgments33 14Legal33 References36 A Appendix:Worked examples36A.1Make a phylogeny for a plant species list (36)B Appendix:FAQs36B.1Running PHYLOCOM (36)B.2AOT (37)B.3BLADJ (37)B.4new2nex and makenex (38)B.5Phylomatic (38)P HYLOCOM is a command-line application for manipulating ecological and phylogenetic data, calculating various metrics of phylogenetic and phenotypic community structure,and measuring trait conservatism and trait correlations.We have developed a system to help take the evolutionary ecologist easily through the steps needed in an analysis of phylogenetic community structure or trait evolution.P HYLOMATIC can be used to rapidly develop a phylogeny for any plant community.This phylogeny can then be input into the PHYLOCOM program to measure phylogenetic relatedness among species occurring together in samples,test hypotheses of community structure,or quantify patterns of trait evolution. If estimates exist for the ages of any node,these can be incorporated,as can branch lengths from other sources.E COVOLVE is a phylogeny growth simulator,using the samefile formats as the other tools.These tools will remain command-line programs,so that they can easily be used inside other programs and shell-scripts.1.1New in Version4.2•Phylomatic no longer outputs a branch length for the root node;the presence of this BL (allowed by the Newick definition)was causing parsing errors in other applications.1.2New in Version4.1•Renamed comdistnn function to comdistnt for consistency•Added null model testing to comdist/comdistnt functions•Windows version now compiled with MINGW32,rather than DJGPP.This should help with some of the memory issues some Windows users were experiencing.A phylocom.bat file is also included to assist in opening a CMD.EXE console window.1.3New in Version4.0•Added code to detect line endings(Mac/Windows/Unix)and adjust automatically.•Added rao function to calculate phylogenetic diversity.•Updated comstruct and comdist to use-a switch to incorporate abundance into phy-logenetic distance calculations.NB:see Hardy(2008)for a discussion of important issues concerning abundances and Type I and II errors in detection of significant phylogenetic struc-ture.•comtrait function calculates trait dispersion within communities.•Modified calculation of phylogenetic signal.2.1Mac OS XUniversal binaries for OS10.5are included in the mac directory.If these do not run(perhaps if you are using an older version of OS X),you may want to build from scratch:acquire the OS X Developer Tools Installer from the Apple website(you may have to register as a developer),or from your installation discs,and install.Then follow the instructions below for a UNIX build.This is command-line software.You need to run it in a terminal window (/Applications/Utilities/Terminal.app).To make PHYLOCOM available anywhere(i.e.,not always requiring the executable in the same directory as your datafiles),create a bin/directory under your home directory,and add this line to your.bash_profilefile:PATH=$PATH:$HOME/bin:.;export PATHor this line to your.tcshrcfile,if you are running the tcsh shell:set path=(.˜/bin$PATH)2.2Linux/Other UnixRather than provideing precompiled binaries for each architecture,unix users should compile lo-cally.These commands should work:$cd Desktop#or to wherever you save the zip file $unzip phylocom.zip$cd phylocom-X#replace X with version no.$cd src$make$./phylocomIf your system does not use gcc,edit the Makefile to reflect your C-compiler.To make PHYLOCOM available anywhere(i.e.,not always requiring the executable in the same directory as your datafiles),create a bin/directory under your home directory,and add this line to your.bash_profilefile:PATH=$PATH:$HOME/bin:.;export PATHor this line to your.tcshrcfile,if you are running the tcsh shell:set path=(.˜/bin$PATH)2.3WindowsWindows32-bit binaries are included in the w32directory.These binaries were compiled with m ingw32under Linux(see Makefile).The binaries run in the Windows console(usually foundat:c:\windows\system32\cmd.exe);note that this shell is no longer the same as MS-DOS,although most of the commands behave as before(see /for anexcellent introduction to the Windows console).If you need to recompile in Windows,use mingw32,or you can install and use the cygwin tools.In order to access PHYLOCOM from anywhere in your directory tree,create a directory where you want phylocom.exe to live(e.g.,C:\PHYLOCOM).Right click the My Computer icon, choose Properties,then click on advanced system settings,then click on environment variables. From there,add the name of the directory where you have placed the PHYLOCOM executable to the end of the list of directories in the path.Copy the executables(.exefiles)to this new directory, open a new command prompt window,and you should be able to type phylocom and have the program run.This is the same as adding these lines to a batchfile:path=%PATH%;C:\phylocomThis is command-line software.You need to run it in a console window.You can acces the console via(one of):1.Start Menu→Applications→Accessories→Command prompt2.Start Menu→Run,and type CMD3.Double clicking on the phylocom.batfile.This must be in the same directory as theexecutables,unless you have set the path,as above.If you are frequently using a particular PHYLOCOM function,you can also make custom short-cuts(a tip from an anonymous reviewer):1.Place your executable phylocom.exe in a standard location(e.g.,"C:\PROGRAM FILES\PHYLOCOM\")2.Create a shortcut to phylocom.exe and rename it(e.g.,phylocom_comstruct).3.Opening the properties of the shortcut,and in the Target box,enter"C:\PROGRAM FILES\PHYLOCOM\PHYLOCOM.EXE"COMSTRUCT>OUT.TXT4.Leave the‘Run In’box blankThe shortcut can now be copied and dropped into any directory where you want to run an analysis (with datafiles named to the default names).Alternatively,you can put the following(all in one line)in the‘Target’box,to run the program in a console,which will stay open after the program has run:C:\WINDOWS\system32\cmd.exe/k"c:\program files\phylocom\phylocom.exe"comstruct>out.txt3I NPUT FILE FORMATSP HYLOCOM uses plain textfiles as input(i.e.,not propriety binary formats such as.xls and .doc).This facilitates connecting the software as part of an analysis or simulation chain,because text processing tools(e.g.,sed,awk,perl)can be used to re-formatfiles,etc.However,because one has access to the internals of the datafiles in this way,one must be extra careful with accuracyof formatting:no extra spaces,tabs,etc.Learning to use a good text editor will greatly help;we recommend TextWrangler for Mac and Notepad-plus for Windows.Note:As of version4.0,PHYLOCOM recognizes the line-endings(UNIX,Mac,Windows)used in eachfile.This should save a lot of wasted time!However,your line-endings must be consistent within eachfile.3.1T ree preparationPHYLOCOM reads Newick-format phylogenies directly.See:/phylip/newicktree.html,and/phylip/newick doc.htmlfor definitions of the Newick standard.The defaultfile name is phylo,but other phylogenyfiles can be specified using the-ffilename option.Plain Newick format phylogenies are also used by PHYLIP.The basic Newick format used by PHYLOCOM is:((A,B),C);The full complexity Newick format that can be read by PHYLOCOM is:((A_sp:1.1,B2-sp:2.2)clade1:1.0[a comment],c_sp:0.5);Please note:•Taxa and interior names must begin with A-Z or a-z,not a number,•Branch lengths will be assumed to be1.0if not present,•Comments are ignored,•The root node should not have a branch length if other branch lengths are given,•There should be no whitespace or line-endings within the tree,•Trailing spaces/line-endings at the end of the phylofile are OK,but not in any otherfile format used by PHYLOCOM,•Multiple Newick strings(as in PHYLIP intreefiles)are currently not allowed in the phylofile,•The basal node must be a dichotomy,not a polytomy.3.1.1Newick and NEXUSOne of the standardfile formats for phylogenies is NEXUS.If you open a NEXUSfile with a text editor,you will see one or more Newick strings in the TREES section.You may be able to simply cut out the Newick string and paste into a new phylofile.However,some programs write by default a translation of the taxon names into numbers.These translated Newick strings need to be ‘un-translated’before using in PHYLOCOM.A simple way to get a Newick string out of a NEXUS file is to open the NEXUSfile in Mesquite,open a tree window(Taxa&Trees→New Tree Window) for the stored tree(Stored Trees),click on the‘Text’tab,and save the text page to afile(File→Save Window as Text...).In the savedfile,edit out everything but the Newick string.3.2Sample preparationPHYLOCOM accepts various kinds of sample data.Samples may represent subsets of the phy-logenetic tree,or ecological data matrices(measurements of species abundance or occurrence in samples of some sort).The defaultfile name is sample,but other samplefiles can be specified using the-s filename option.The samplefile has the following format:•3columns,tab delimited,sorted by column1•one row per taxon:1.Sample(plot,quadrat,trap,etc.)name(character string,no spaces,should begin with[A-Za-z])2.Abundance(integer;leave as1for presence/absence data)3.Species code(string,same as in phylo,should begin with[A-Za-z])•all species in this table MUST be included in phyloYou can wrestle your data into this format pretty easily,using a stats package or spreadsheet. Look at the included examplefile sample for an idea of what thefile should look like.The current version has been tested to work with fairly large ecological data sets as samplefiles(e.g., 400species and5,000samples).3.3T raits preparationAny number of characters can be included in the traitsfile.Note that missing trait values are not allowed,and the list of taxa in traits must exactly match the terminal taxa in the phylofile. The defaultfile name is traits,but other traitfiles can be specified using the-t filename option.Thefirst line of traits must read:type<TAB>n<TAB>n<TAB>...[up to the number of traits]where n indicates the type of trait in each of the four columns:0for binary(only one binary trait may be included,and it must be in thefirst column)1for unordered multistate(no algorithms currently implemented)2for ordered multistate(currently treated as continuous)3for continuousOptional:The second line can start with the word name(lower case only)and then list the names of the traits in order.These will appear in the full outputfile.Subsequent lines should have the taxon name,which must be identical to its appearance in phylo,and the data columns separated by tabs.See the example traitsfile distributed with the program.4U SING PHYLOCOMNOTE:Investing in a basic guidebook for UNIX or DOS/Windows console will be very worthwhile in the long run.Alternatively,the web is full of useful pages,e.g.,Google:‘introduction to unix’or‘commandline Windows’.Try this softwarefirst with the included examplefiles.Either i)copy thesefiles to the same directory as the executables,or ii)make the executables universallyfindable on your system(see §2.3),and just cd to where your(demo or real)inputfiles are.Then just type:$phylocom(NOTE,the$symbol in this manual indicates the command prompt;do not type this symbol,only what follows it).If you are using OS X/UNIX and haven’t placed the PHYLOCOM executable in your path(see§2.3),type:$./phylocomIn Windows console,just double-click on the phylocom.batfile,or manually cd to the correct directory,and type:C:\SOME\PATH>PHYLOCOM.EXEor just:C:\SOME\PATH>phylocomA welcome screen should appear.The format for the various options is:$phylocom method[optional parameters]Basic information can be obtained at any time by calling:$phylocom helpOutput from the programs is written to the screen(generally/dev/stdout,although some warnings are sent to/dev/stderr).In order to capture the output into afile,it must be redi-rected to afilename,e.g.:$phylocom comstruct>myoutput.txtor:$phylocom bladj>mytree.newThis syntax works on both UNIX systems and in the Windows console.5B ASIC DATA EXTRACTION AND MANIPULATIONA number of algorithms have been included to assist in summarizing data and converting between formats.These are generally simple to write,and we welcome suggestions for new facilities.5.1AGENODEOutputs the ages for each node in the phylogeny,calculated from the branch lengths.The num-bering system is the same as throughout this application.The root node is node0,and nodes are numbered incrementally reading across the parentheses in the Newick input tree.5.2AGETERMOutputs the stem age of each terminal taxon(age of each taxon’s most recent ancestor node).5.3BLADJWhat do you do if you have a phylogenetic topology,with some nodes aged,but no branch lengths to smooth the rates of(with r8s)?You can still use r8s without branch lengths to force an ultrametric tree.Or you can use bladj!This is a simple utility that takes a phylogeny,fixes the root node at a specified age,andfixes other nodes you might have age estimates for.It then sets all other branch lengths by placing the nodes evenly between dated nodes,and between dated nodes and terminals(beginning with the longest‘chains’).This has the effect of minimizing variance in branch length,within the constraints of dated nodes.It thus produces a pseudo-chronogram that can be useful for estimating phylogenetic distance(in units of time)between taxa for,for instance,the analysis of phylogenetic community structure.Even with only a few nodes dated,the resulting phylogenetic distances can be a marked improvement on simply using the number of intervening nodes as a phylogenetic distance(see Webb,2000).B LADJ takes as its input a phylogeny(the phylofile),with named internal nodes,and a simple table of interior node names and ages(the agesfile,format:name<TAB>age<RETURN>;NB: node names need to match exactly,including case,between the ages and phylofiles).It returns a new phylogeny with adjusted branch lengths.IMPORTANT:the root node of the phylogeny must be named and given an age.Included in the distribution is a simple agesfile(called wikstrom.ages)with angiosperm nodes aged according to Wikstrom et al.(2001).I fully acknowledge that these ages are not the maximum age for,e.g.a family,but simply an estimate of the MRCA of the two most distant taxa in a clade included in Wikstrom’s analysis.The correct statement is that the clade represented bythis node is at least as old as the age given,and no older than the age of the next older node dated in the list.We all await an online database of fossil-based estimates of node age!Make sure afile named ages is present in the same directory as the phylofile.Then run: $phylocom bladjor:$phylocom bladj>output_tree.newThe output Newick tree will be ultrametric,with branch lengths scales to time.Please Note:If a name in the agesfile matches a terminal taxon in the tree,that terminal taxon will be positioned at the corresponding age,and not at an age of0.The resulting tree will not be ultrametric.If this non-ultrametric tree is used as an input tree to phylomatic,the resulting tree will also not be ultrametric.To avoid this problem,build your phylomatic treesfirst,and apply BLADJ to thefinal tree.5.4CLEANPHYRemoves‘one-daughter nodes’from a phylo phylogeny,and the branch length of the root node. One-daughter nodes are allowed in the Newick definition,and are useful for storing information about hierarchical taxonomic classes.P HYLOMATIC includes numerous one-daughter nodes,and because most other phylogenetic applications do not accept one-daughter nodes,these need to be ‘cleaned out.’Root‘tails’are also allowed by the definition,but many other applications choke on them.Run:$phylocom cleanphy-f phylomatic_out.new-e>clean.newto remove one-daughter nodes from afile phylomatic_out.new.The-e switch suppresses the creation of automatic branch lengths of1.0.5.5COMNODEA simple consensus algorithm thatfinds the common nodes in two trees,named tree1and tree2.Creates common names for the matching internal nodes and outputs a simple Nexus format tree readable by most tree-viewing software(e.g.,TreeView).This tool can be used to add branch lengths to a supertree:1.Let tree1be a phylogeny with branch lengths from,e.g.,molecular analysis,including afew of the species in the supertree.2.Let tree2be a supertree,without branch lengths,containing some of the taxa in tree1.3.Run phylocom comnode>out.nex4.Extract‘tree1’from the outputfile into a new phylofile5.Run phylocom agenode>ages6.Edit the agesfile to only leave the lines begining‘match.’7.Extract‘tree2’from the outputfile into a new phylofile8.Run phylocom bladj.The output tree contains all the taxa in the supertree,with branchlengths constrained by tree1.See the includedfiles tree1and tree2and run phylocom comnode to test the algorithm. See Strauss et al.(2006)for an example.5.6MAKENEXReads a phylofile,a samplefile,and a traitsfile and outputs a NEXUSfile readable by Mesquite.It includes up to four CHARACTER blocks with:1.taxa presence or absence in the various samples coded as0or1.2.taxa abundance in the various samples as a continuous variable.Hint:want to know whetherthere is a phylogenetic signal in abundance?Make a sample unit in a samplefile with all taxa in the phylofile.Run phylocom makenex.Open in Mesquite.Choose‘Trace Character History.’Test significance of any conservative trend in abundance by making a continuous trait that is abundance,and running aot.3.Any discrete characters in the traitsfile.4.Any continuous characters in the traitsfie‘Trace Character History’to view the distri-bution of traits and/or species presence/absence in samples on the pool phylogeny.Note that currently all three inputfiles are needed.Create a dummy traits or samplefile if needed.5.7NEW2NEXConverts a Newickfile(the phylofile)to a Mesquite-readable NEXUS-formatfile.Note that this function is very similar to the MAKENEX function,but does not require sample and traits files as input.5.8NEW2FYConverts a Newickfile(the phylofile)to a simple tabular format,with each node as a row. Tab-delimited columns are:•nodeID•parent node nodeID•number of daughter nodes•partial list of daughter nodeID s•depth of node(number of edges from root)•branch length to parent node(afloat)•node name5.9PHYDISTCalculates the simple pairwise matrix of phylogenetic distances among terminal taxa for the whole phylogeny pool(phylo).This could be useful even if you are not interested in community struc-ture.The column and row headings are terminal names in the phylofile.5.10PHYVARCalculates the phylogenetic variance-covariance matrix:approximately the‘inverse’of the of phy-logenetic distance matrix—taxa that are closely related have high phylogenetic covariance.5.11NAFConvert all datafiles(sample,phylo,traits all needed)into a‘node-as-factor’table,for analysis of trait values(or sample abundance values)by simple or hierarchical ANOV A.All taxa subtending to a particular daughter node are coded with a similar value in a column for each node. Hence variance in a trait for terminals in one clade can easily be compared to variance in terminals in the sister clade.5.12RNDPRUNERandomly prunes the phylo phylogeny.Two switches control the output:-r N:performs the randomization N times,-p N:includes N terminals.The randomization simply selects randomly(from an even distribution)from the names of the terminals in phylo.5.13SAMPLEPRUNEPrunes a phylo phylogeny by the members of each sample unit in the samplefile.5.14VERSIONOutputs the version of PHYLOCOM,including both the‘given’version(e.g.,4.2)and the SVN revision(e.g.,252).6P HYLOGENETIC COMMUNITY STRUCTURE6.1Phylogenetic community structure metrics6.1.1PDCalculates Faith’s(1992)index of phylogenetic diversity(PD)for each sample in the phylo.Faith’s PD index(total branch length among all taxa in a sample,including the root node of the tree)is reported,as are the total branch length in the phylogeny,and the proportion of the total branch length in the phylogeny associated with the taxa in each sample.6.1.2COMSTRUCTCalculates mean phylogenetic distance(MPD)and mean nearest phylogenetic taxon distance (MNTD;aka MNND)for each sample,and compares them to MPD/MNTD values for randomly generated samples(null communities)or phylogenies.This function accepts the switch-a to weight phylogenetic distances by taxa abundances. This changes the interpretation of MPD from the average distance among two random taxa chosen from the sample(default)to the average distance among two random individuals drawn from the sample(-a argument).Similarly,it changes the interpretation of MNTD from the average distance to closest relative for each taxon in the sample(default)to the average distance to closest non-conspecific relative for each individual in the sample(-a argument).For each run,the samples or phylogeny are randomized using one of several null models(de-scribed below).The mean and standard deviation of MPD/MNTD for the randomly generated null communities are reported for each sample.The rank of observed MPD/MNTD values relative to the values in the null communities are reported as rankLow(number of null communities with MPD/MNTD values less than or equal to observed)and rankHi(number of null communities with MPD/MNTD values greater than or equal to observed).These ranks can be used to calculate P-values(e.g.for a one-tailed P-value,divide a rank by the numberofruns+1).Note that if the sum of rankLow and rankHi for MPD or MNTD is not close to the number of runs,there must be a large number of ties between observed and null community values and results should be interpreted with caution.This situation may arise when using very small phylogenies or numbers of samples.Two measures of‘standardized effect size’of phylogenetic community structure are calculated: the Net Relatedness Index(NRI)and Nearest Taxon Index(NTI)describe the difference between average phylogenetic distances in the observed and null communities,standardized by the standard deviation of phylogenetic distances in the null communities.NRI and NTI are calculated for each sample in a manner similar to that described in Webb et al.(2002):NRI sample=−1×MP D sample−MP D rndsample sd(MP D rndsample)NT I sample=−1×MNT D sample−MNT D rndsample sd(MNT D rndsample)6.1.3Null modelsChoosing an appropriate null model and species pool to measure phylogenetic community structure requires careful consideration.Every null model makes different assumptions,and using two null models or different species pools to analyze the same data can give radically different results.See Gotelli(2000)or Gotelli and Graves(1996)for an evaluation of the assumptions and shortcomings of the different types of null models implemented in this software,and Kembel and Hubbell(2006) for an example of these null models applied to ecological data.Specify which null model to use with comstruct using the-m command line option plus the number correponding to one of the following null models:0Phylogeny shuffle:This null model shuffles species labels across the entire phylogeny.This randomizes phylogenetic relationships among species.1Species in each sample become random draws from sample pool:This null model maintains the species richness of each sample,but the identities of the species occurring in each sample are randomized.For each sample,species are drawn without replacement from the list of all species actually occurring in at least one sample.Thus,species in the phylogeny that are not actually observed to occur in a sample will not be included in the null communities.2Species in each sample become random draws from phylogeny pool:This null model main-tains the species richness of each sample,but the identities of the species occurring in each sample are randomized.For each sample,species are drawn without replacement from the list of all species in the phylogeny pool.All species in the phylogeny will have equal prob-ability of being included in the null communities.By changing the phylogeny,different species pools can be simulated.For example,the phylogeny could include the species present in some larger region.3Independent swap:The independent swap algorithm(Gotelli and Entsminger,2003);also known as‘SIM9’(Gotelli,2000)creates swapped versions of the sample/species matrix.It constrains the swapped matrices to have the same row and column totals as the orig-inal matrix(i.e.number of species per sample and frequency of occurrence of each species across samples are held constant as species co-occurrences in samples are random-ized).The algorithm searches the presence/absence matrix for‘checkerboard’cells(pairs of species/samples of the form(0..1),(1..0)or vice versa)and swaps these cell con-tents when itfinds them.Number of swaps per run can be set at the command line with the -w argument.Number of swaps per run defaults to1000,but please note that the number of swaps must be large relative to the number of occupied cells in the species/sample matrix to ensure the community is properly randomized.This null model can be very computationally demanding when dealing with large numbers of species or samples.Note also that this null model randomizes patterns of species co-occurrence in samples,but not abundances,and it does not introduce species from the phylogeny pool into the samples.Functions incuding randomization test all accept several additional switches:•-r X to set the number of runs(X)to randomize over.Can be zero.Otherwise the default value(999runs)is used.•-a to use abundance data in calculations.When this switch is used,all results reflect phylo-genetic distances among individuals(abundance-weighted distances)as opposed to distances among taxa.Examples include:$phylocom comstruct-m0-r9999$phylocom comstruct-m0-r9999-a$phylocom comstruct-m3-w100-r9996.1.4SWAPSwaps the sample a number of times using the null model algorithm specified by the -m#option(described in the comstruct documentation)and outputs the resulting。

美洲黑杨木质素合成关键基因的克隆及反义表达载体的构建

美洲黑杨木质素合成关键基因的克隆及反义表达载体的构建
d sg e rme s S q e cn e u t r v a t a h DNA l n t s ae 8 9 b n 9 p r s e t ey Ad i o al ,x l m e in d p i r . e u n i g rs l e e l h tt e c s e gh r 5 p a d 4 6 b e p ci l. d t n l v i y ye
NO V.201 0
21 0 0年 1 月 1
美洲 黑杨 木 质 素 合成 关 键 基 因 的克 隆及 反 义 表 达 载 体 的构 建
曹 阳, 袁 澍, 徐 飞 ,张 中伟 , 立薇 , 宏辉 薛 林
606 ) 10 4 ( 四川大学 生命科 学学院,四川 成都 摘
要 :以美洲黑杨 Pp l eo e 新萌叶片为材料 , ouu dl i s s td 通过 自行设计引物 , R — C 用 T P R的方法克隆 了美 洲黑杨木 质
gns4cu aaeCA l ae( C )gn n inm l l h l ey r eae( A gn ) yR —C ehdaduig ee( 一om rt o i s 4 L eeadCn a y c o dhdo n s C D) ee b TP R m to n s — g ao g n
po oe q e c f C e e( C p w s rf il y te i d a dtel g 10 b . C p 4 a dC D g n s ee rm tr e u n eo 4 g n 4 L ) a t ca snh s e , n n hi 1 8 p 4 L , C n A e e r s ai i z h et s w
CAO n Ya g,YUAN u,XU i Sh Fe ,ZHANG o — i Zh ng we ,XUE — i Liwe ,LI Hon — ui N gh

CELL基于人类和啮齿动物研究发现:基底外侧杏仁核是快速逃逸的必要条件

CELL基于人类和啮齿动物研究发现:基底外侧杏仁核是快速逃逸的必要条件

起作用,文章将人体实验转化为啮齿动物模型,选用具有化学遗传BLA神经元沉默的啮齿动物进行神经生物学实验。

实验发现,当啮齿动物和人类面临迫在眉睫的威胁时,BLA对于快速逃避行为的选择和执行至关重要。

实验发现了BLA通过激活特定一组CeA神经元,对即将来临的威胁实现快速逃逸反应的机制。

该研究结果具有基础和临床相关性,这种机制应用于临床病理性恐惧和焦虑具有更深入的适用性,对人类和啮齿动物的进一步生物行为研究是至关重要的。

关键词:基底外侧杏仁核;中央杏仁核;Urbach-Wiethe病;快速逃避行为机制啮齿动物研究已发现基底外侧杏仁核(BLA)和中央杏仁核(CeA)是如何控制防御行为,但是这些发现机制需与人类防御行为对应。

本文将具有自然选择性的双侧BLA病变的人群与具有化学遗传的BLA神经元沉默的大鼠进行比较。

结果发现,在物种间,BLA在暴露于迫在眉睫但无法逃避的威胁期间选择主动逃逸,而不是被动停滞的过程中起着重要作用(Timm 调节)。

与对照组相比,BLA损伤的人表现出惊恐增强,同时BLA沉默的大鼠表现出惊恐情绪增强、停滞和逃避动作减弱的行为。

人体神经影像学表明,BLA通过CeA抑制脑干,降低了被动防御反应。

实际上,Timm调节将BLA投射加强到抑制性CeA途径上,且该途径的药理学激活弥补了BLA沉默大鼠中缺乏的Timm反应。

文章数据揭示了BLA如何通过CeA自适应地调节逃避行为以应对迫在眉睫的威胁,并且该机制在啮齿动物和人类中的进化是保守的。

人类研究结果(一)钙化发生于人类双侧BLA区来介导冷冻和威胁增强的惊恐反应。

通过将TET与fMRI相结合,在人类受试者中搜索这些区域的活动变化(图3A)。

实验结果根据之前研究发现,TET引起大脑突显网络的威胁反应(前岛叶,前扣带皮层,丘脑和中脑/ PAG;图3B-3E)。

至关重要的是,我们在BLA受损对象的脑桥内观察到一组体素对威胁距离的敏感度明显高于健康对照组(图3F)。

野慈姑和矮慈姑种间花粉传递与生殖

野慈姑和矮慈姑种间花粉传递与生殖
收稿日期: 2022 ̄07 ̄16ꎬ 修回日期: 2022 ̄08 ̄20ꎮ
基金项目: 国家自然科学基金(31970250) ꎮ
This work was supported by a grant from the National Natural Science Foundation of China (31970250) .
interference due to similar reproductive biological characteristics. Fruits can be formed in
hand ̄pollination hybridization experiments of Sagittaria trifolia L. and S. pygmaea L.ꎬ but the
植物科学学报 2022ꎬ 40(6) : 762 ~770
http: // www.plantscience.cn
Plant Science Journal
DOI:10 11913 / PSJ 2095-0837 2022 60762
唐莎莎ꎬ 费采虹ꎬ 杨聪ꎬ 尚书禾ꎬ 熊浩镧ꎬ 王欣怡ꎬ 汪小凡. 野慈姑和矮慈姑种间花粉传递与生殖干扰不对称性[ J] . 植物科学学报ꎬ 2022ꎬ 40
缘本地物种对可能为同域分布ꎬ 并占据相同的栖息

[9]
ꎮ 作为进化生态学研究关注的重要科学问题ꎬ
同域分布的近缘物种之间生殖干扰的式样和机制有
待深入研究ꎬ 以拓展对于物种间相互作用与共存机
制间关系的理解ꎮ
递在植物群落中普遍存在
优势ꎬ 但异种花粉管也能在雌蕊群中生长并进入胚
珠 [25] ꎮ 前期研究发现ꎬ 二者种间杂交能形成膨大

长非编码+RNA与其他表观遗传学的相互调控与神经系统疾病[1]

长非编码+RNA与其他表观遗传学的相互调控与神经系统疾病[1]

长⾮编码+RNA与其他表观遗传学的相互调控与神经系统疾病[1]⽹络出版时间:2012-9-1810:16 ⽹络出版地址:http://www.cnki.net/kcms/detail/34.1086.R.20120918.1016.201210.1348_005.html长⾮编码RNA与其他表观遗传学的相互调控与神经系统疾病徐 宏,梁尚栋(南昌⼤学基础医学院⽣理学教研室,江西南昌 330006)doi:10.3969/j.issn.1001-1978.2012.10.005⽂献标志码:A⽂章编号:1001-1978(2012)10-1348-04中国图书分类号:R-05;R342畅2;R394;R741摘要:开始受到⼈们关注的长⾮编码RNA(lncRNA)是⼀类长度超过200个核苷酸RNA分⼦。

lncRNA虽⽆编码蛋⽩质的功能,但其功能异常与神经系统疾病密切相关。

作为新发现的表观遗传调控形式,lncRNA可通过与近年来研究较多的其他表观遗传学调控(包括DNA甲基化、组蛋⽩修饰、基因组印记、染⾊质重构和RNA⼲扰等)的相互作⽤参与调控和维持神经系统功能的稳态。

该⽂就lncRNA与其他多种表观遗传学的相互调控与神经系统疾病研究的新进展进⾏综述,有助于深⼊探寻lncRNA介导神经系统疾病的发病机制。

关键词:长⾮编码RNA;表观遗传;DNA甲基化;组蛋⽩修饰;基因组印记;染⾊质重构;神经系统疾病收稿⽇期:2012-05-12,修回⽇期:2012-07-02基⾦项⽬:国家⾃然科学基⾦资助项⽬(No30860086,31060139,81171184);江西省科技⽀撑计划-社会发展⽀撑计划重点项⽬(No2010BSA09500);江西省教育厅青年基⾦资助项⽬(NoGJJ12149)作者简介:徐 宏(1978-),男,博⼠⽣,讲师,研究⽅向:神经⽣理与病理⽣理学,E-mail:cray0127@163.com;梁尚栋(1957-),男,博⼠,教授,博⼠⽣导师,研究⽅向:神经⽣理与病理⽣理学,通讯作者,E-mail:liangsd88@163.com表观遗传学是指DNA序列不发⽣变化但基因表达却发⽣了可遗传的改变。

tree-reconciliation method -回复

tree-reconciliation method -回复

tree-reconciliation method -回复树的和解方法(treereconciliation method)引言:在生物学中,树形关系是一种常见的表示物种进化和亲缘关系的方式。

然而,由于种群间的迁移、突变和灭绝等原因,树状分支结构往往会出现分歧和矛盾。

为了解决这些分歧和矛盾,科学家们提出了树的和解方法,也被称为树形关系重建。

本文将一步一步回答关于树的和解方法的相关问题。

第一步:为什么树的和解方法是必要的?树的和解方法的主要目的是通过对序列数据进行建模和推断,以揭示物种间的相关性和相似性。

树形关系的重建有助于我们理解生物进化和亲缘关系,以及它们在环境适应和生态系统中的角色。

它还有助于我们研究疾病的起源和传播,以及其他生物学的重要问题。

树的和解方法是帮助我们理解生物多样性和演化的重要工具。

第二步:树的和解方法的基本原理是什么?树的和解方法基于一组序列数据,如DNA、RNA或蛋白质序列。

这些序列数据用于建立序列之间的相似性和差异性。

树冠的位置和分支长度表示物种之间的相关性和进化距离。

树的和解方法使用各种模型和算法,根据序列间的此相似性和差异性计算最有可能的树形拓扑结构。

第三步:树的和解方法的主要步骤是什么?树的和解方法通常涉及以下几个主要步骤:数据收集、序列比对、模型选择、进化树构建和树的评估。

1. 数据收集:首先,需要收集所需的序列数据,这些数据可以是来自不同种群或物种的DNA、RNA或蛋白质序列。

数据的质量和数量对于树的和解方法至关重要。

2. 序列比对:收集到的序列数据需要进行比对,以找出它们之间的相同和差异的位置。

比对可以通过计算相似性分数来执行,或者通过使用序列比对算法(如BLAST)来完成。

3. 模型选择:模型选择考虑到序列数据的演化模式,以确定最适合数据的演化模型。

这一步骤非常重要,因为不同的模型可能适用于不同类型的数据。

4. 进化树构建:在模型选择完成后,可以使用一些常见的树形关系建模方法(如最大似然法、贝叶斯推断法等)构建进化树。

杨树HDZIP基因家族全基因组研究

杨树HDZIP基因家族全基因组研究

杨树HDZIP基因家族全基因组研究杨树(Populus)是一种重要的经济林木,广泛分布于北半球的温带和寒带地区。

杨树具有快速生长和高生产力的特点,被广泛用于木材、纸浆和生物能源的生产中。

为了更好地了解杨树的生物学特性和提高其经济价值,进行杨树基因组研究至关重要。

HDZIP(Homeodomain-Leucine Zipper)基因家族是一类在植物发育和逆境响应中起重要作用的转录因子家族。

HDZIP基因家族在许多植物中都存在,并参与了植物的发育和逆境响应过程。

因此,研究杨树的HDZIP基因家族对于了解杨树的分子调控机制和提高杨树的逆境耐受性具有重要意义。

为了研究杨树的HDZIP基因家族,研究人员首先进行了杨树全基因组的测序工作。

通过这一工作,研究人员确定了杨树的基因组大小和结构,并获得了完整的杨树基因组序列。

接着,研究人员对杨树基因组中的HDZIP基因进行了鉴定和注释。

他们使用了多种生物信息学工具和数据库,对杨树基因组中的候选HDZIP基因进行了筛选和分析。

最终,研究人员确定了杨树中的HDZIP基因家族成员。

通过对杨树HDZIP基因家族成员的分析,研究人员发现这些基因在杨树的不同组织和发育阶段中具有不同的表达模式。

一些HDZIP基因在杨树的叶片和根系中高度表达,而另一些基因在花序和种子中高度表达。

这表明杨树的HDZIP基因在不同组织和发育阶段中具有不同的功能。

此外,研究人员还发现一些杨树HDZIP基因在逆境胁迫下的表达受到调控。

这些基因在干旱、高盐和低温等逆境条件下的表达水平明显上调。

这表明杨树的HDZIP基因在逆境响应中起着重要的调控作用。

总的来说,杨树HDZIP基因家族的全基因组研究为了解杨树的分子调控机制和提高杨树的逆境耐受性提供了重要的线索。

这项研究为今后进一步研究杨树的逆境响应机制和利用基因工程手段来提高杨树的生产力奠定了基础。

杨树HDZIP基因家族全基因组研究

杨树HDZIP基因家族全基因组研究

573-581.[2]URUO T,YAMAGUCHI-SHINOZAKI K,URAO S,SHINOZAKI K..An Arabidopsis MYB homolog is induced by dehydration stress and its gene product binds to the conserved MYB recognition sequence[J].Plant Cell,1993,5(11):1529-1539.[3]JUNG C,SEO JS,HAN SW,KOO YJ,KIM CH,SONG SI,NALLNL BH,CHOI YD,CHEONG JJ,Overexpression of AtMYB44 Enhances Stomatal Closure to Confer Abiotic Stress Tolerance in Transgenic Arabidopsis[J].Plant Physiol,2008,146:623-635.[4] DAI XY,XU YY,MA QB,XU WY,Wang T,XUE YB,CHONG K.Overexpression of a R1R2R3MYB gene,OsMYB3R2,increases tolerance to freezing, drought, and salt stress in transgenic Arabidopsis[J].Plant Physiol,2007,143:1739-1751.[5] PAZ-ARES J,GHOSAL D,WIENAND U,et al.The regulatory c1 locus of Zea mays encodes a protein with homology to myb proto-oncogene products and with structural similarities to transcriptional activators [J].The EMBO Journal,1987(12):3553-3558.杨树HD-ZIP基因家族全基因组研究作者:陈雪指导教师:项艳(安徽农业大学林学与园林学院,安徽合肥230036)摘要:同源异型-亮氨酸拉链(HD-ZIP)蛋白是植物所特有的一类转录因子,类属于同源异型盒蛋白家族,它包含一个高度保守的同源异型结构域(HD),HD羧基末端紧连接着一个亮氨酸拉链(LZ)结构域。

逃脱树根转基因技巧

逃脱树根转基因技巧

逃脱树根转基因技巧英文回答:Escaping the Clutches of Genetic Engineering Techniques.Genetic engineering techniques have revolutionized the field of biotechnology, allowing scientists to manipulatethe genetic makeup of organisms. While these techniqueshave brought about numerous advancements in medicine, agriculture, and environmental conservation, there are concerns regarding the potential risks and ethical implications associated with genetically modified organisms (GMOs). As an individual who wishes to avoid the potential pitfalls of genetic engineering, I have devised a plan to escape the clutches of these techniques.First and foremost, it is crucial to stay informed and educated about genetic engineering and its applications. By understanding the underlying principles and techniques involved, one can make informed decisions and takenecessary precautions. Reading scientific journals, attending seminars, and engaging in discussions with experts in the field are effective ways to stay updated.Secondly, it is important to be mindful of the food we consume. Genetically modified crops are prevalent in our food system, and it can be challenging to completely avoid them. However, by opting for organic and non-GMO certified products, we can minimize our exposure to genetically modified ingredients. Additionally, growing our own fruits and vegetables in a controlled environment ensures that we have complete control over the genetic makeup of our food.Furthermore, supporting local farmers and participating in community-supported agriculture programs can also help in avoiding genetically modified foods. By purchasing directly from farmers who practice traditional, non-GMO farming methods, we can support sustainable agriculture and reduce our reliance on genetically modified crops.In addition to food, it is important to be cautious about the use of genetically modified organisms in otherareas of our lives. This includes products such as genetically modified mosquitoes for pest control or genetically modified bacteria for industrial purposes. By researching and choosing alternatives that do not involve genetic engineering, we can minimize our exposure to these techniques.Lastly, it is important to advocate for transparent labeling and regulation of genetically modified organisms. By supporting initiatives that push for clear labeling of GMO products and strict regulations on their use, we can empower consumers to make informed choices and hold companies accountable for their practices.In conclusion, escaping the clutches of genetic engineering techniques requires vigilance, education, and conscious decision-making. By staying informed, being mindful of our food choices, supporting non-GMO farming practices, and advocating for transparency, we can navigate through the complexities of genetic engineering and protect ourselves from potential risks. Remember, knowledge is power, and by taking control of our choices, we can escapethe influence of genetic engineering techniques.中文回答:逃脱基因工程技巧的魔爪。

外文翻译---一种基于树结构的快速多目标遗传算法

外文翻译---一种基于树结构的快速多目标遗传算法

附录4一种基于树结构的快速多目标遗传算法介绍:一般来讲,解决多目标的科学和工程问题,是一个非常困难的任务。

在这些多目标优化问题(MOPS)中,这些目标往往在一个高维的问题空间发生冲突,而且多目标优化也需要更多的计算资源。

一些经典的优化方法表明将多目标优化转化成为单目标优化问题,其中许多运行被要求找到多个解决方案。

这使得一种算法返回一组候选解,这比只返回一个基于目标的权重解的算法更好。

由于这个原因,在过去20年中,人们越来越感兴趣把进化算法(EAs)应用到多目标优化中。

许多多目标进化算法(MOEAs)已经被提出,这些多目标进化算法使用Pareto占优的概念来引导搜索,并返回一组非支配解作为结果。

与在单目标优化中找到最优解作为最终的解不同,在多目标优化中有二个目标:(1)收敛到Pareto最优解集(2)在Pareto最优解集中保持解的多样性。

为了解决在多目标优化中这两个有时候会冲突的任务,许多策略和方法被提出。

这些方法的一个共同的问题是,它们往往是错综复杂的。

对于这两项任务,为了得到更优秀的解,一些复杂的策略通常被使用,并且许多参数需要依据经验和已经得到的问题信息进行调整。

另外,许多多目标进化算法有高达(G是代数,M是目标函数的数量,N是种群大小。

这些符号在下文也保持相同的含义)。

在这篇文章中,我们提出了一种基于树结构的快速多目标遗传算法。

(这个数据结构是一个二进制树,它保存了在多目标优化中解的三值支配关系(例如,正在支配、被支配和非支配),因此,我们命名它为支配树(DT)。

由于一些独特的性能,使支配树能够含蓄地包含种群个体的密度信息,并且很明显地减少了种群个体之间的比较。

计算复杂度实验也表明,支配树是一种处理种群有效的工具。

基于支配树的进化算法(DTEA)统一了在支配树中的收敛性和多样性策略,即多目标进化算法中的两个目标,并且由于只有几个参数,这种算法很容易操作。

另外,基于支配树的进化算法(DTEA)使用了一种特别设计的基于支配树(DT)的消除策略。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

579⁄0022-0000/02$35.00©2002Elsevier Science (USA)All rights reserved.Journal of Computer and System Sciences 64,579–627(2002)doi:10.1006/jcss.2001.1809Hypertree Decompositions andTractable Queries 11A preliminary version of this paper appeared in the ‘‘Proceedings of the Eighteenth ACM Sympo-sium on Principles of Database Systems (PODS’99),’’pp.21–32,Philadelphia,May 1999.Research supported by FWF (Austrian Science Funds)under the Project Z29-INF.Part of the work of Francesco Scarcello has been carried out while visiting the Technische Universität Wien.Part of the work of Nicola Leone has been carried out while he was with the Technische Universität Wien.Georg GottlobInstitut für Informationssysteme,Technische Universität Wien,A-1040Vienna,AustriaE-mail:gottlob@dbai.tuwien.ac.atNicola LeoneDepartment of Mathematics,University of Calabria,I-87030Rende,ItalyE-mail:leone@unical.itandFrancesco ScarcelloD.E.I.S.,University of Calabria,I-87030Rende,ItalyE-mail:scarcello@unical.itReceived November 3,1999;revised October 8,2000Several important decision problems on conjunctive queries (CQs)areNP-complete in general but become tractable,and actually highly paralleliz-able,if restricted to acyclic or nearly acyclic queries.Examples are theevaluation of Boolean CQs and query containment.These problems wereshown tractable for conjunctive queries of bounded treewidth (Ch.Chekuriand A.Rajaraman,put.Sci.239(2000),211–229),and ofbounded degree of cyclicity (M.Gyssens et al .,Artif.Intell.66(1994),57–89;M.Gyssens and J.Paredaens,in ‘‘Advances in Database Theory,’’Vol.2,pp.85–122,Plenum Press,New York,1984).The so far most general conceptof nearly acyclic queries was the notion of queries of bounded query-widthintroduced by Chekuri and Rajaraman (2000).While CQs of bounded query-width are tractable,it remained unclear whether such queries are efficientlyrecognizable.Chekuri and Rajaraman (2000)stated as an open problemwhether for each constant k it can be determined in polynomial time ifa query has query-width at most k .We give a negative answer by provingthe NP-completeness of this problem (specifically,for k=4).In order to580GOTTLOB,LEONE,AND SCARCELLOcircumvent this difficulty,we introduce the new concept of hypertree decom-position of a query and the corresponding notion of hypertree-width.We prove:(a)for each k,the class of queries with query-width bounded by k is properlycontained in the class of queries whose hypertree-width is bounded by k;(b)unlike query-width,constant hypertree-width is efficiently recognizable;and(c)Boolean queries of bounded hypertree-width can be efficiently evaluated.©2002Elsevier Science(USA)1.INTRODUCTION AND OVERVIEW OF RESULTS1.1.Conjunctive Queries and Join TreesOne of the simplest but also one of the most important classes of database queries is the class of conjunctive queries(CQs).In this paper we adopt the logical representation of a relational database[40,1],where data tuples are identified with logical ground atoms,and conjunctive queries are represented as datalog rules.We will,in the first place,deal with Boolean conjunctive queries(BCQs)represented by rules whose heads are variable-free,i.e.,propositional(see Example1.1below). From our results on Boolean queries,we are able to derive complexity results on important database problems concerning general(not necessarily Boolean) conjunctive queries.Example 1.1.Consider a relational database with the following relation schemas:enrolled(Pers#,Course#,Reg–Date)teaches(Pers#,Course#,Assigned)parent(Pers1,Pers2)The BCQ Q1below checks whether some student is enrolled in a course taught by his/her parent.Q1:ans P enrolled(S,C,R)N teaches(P,C,A)N parent(P,S).The following query Q2asks:Is there a professor who has a child enrolled in some course?Q2:ans P teaches(P,C,A)N enrolled(S,CŒ,R)N parent(P,S). Decision problems such as the evaluation problem of Boolean CQs,the tuple-of-query problem(i.e.,checking whether a given tuple belongs to a CQ),and the con-tainment problem for CQs have been studied intensively.(For recent references,see [29,9].)These problems—which are all equivalent via simple logspace transforma-tions(see[19])—are NP-complete in the general setting but are polynomially solvable for a number of syntactically restricted subclasses.Most prominent among the polynomial cases is the class of acyclic queries or tree queries[44,4,18,45,10,13,14,31].These queries can be characterized in terms ofHYPERTREE DECOMPOSITIONS AND TRACTABLE QUERIES581FIG.1.A join tree of Q2.join trees:A query Q is acyclic iff it has a join tree[4,3].A join tree JT(Q)for a conjunctive query Q is a tree whose vertices are the atoms in the body of Q such that whenever the same variable X occurs in two atoms A1and A2,then A1and A2 are connected in JT(Q),and X occurs in each atom on the unique path linking A1 and A2.In other words,the set of nodes in which X occurs induces a(connected) subtree of JT(Q).We will refer to this condition as the Connectedness Condition of join trees.of example1.1is cyclic and admits no join tree, Example 1.2.While query Q1query Q2is acyclic.A join tree for Q2is shown in Fig.1(note that predicate names are abbreviated by their first letter in the figure).Acyclic Boolean queries can be efficiently evaluated.Intuitively,this is due to the fact that they can be evaluated by processing the join tree bottom-up by performing upward semijoins,thus keeping small the size of the intermediate relations(that could become exponential if regular joins were performed).This method is the Boolean version of Yannakakis’evaluation algorithm for general conjunctive queries[44].Actually,this evaluation can be also performed in a highly parallel fashion,independently of the join tree shape[19].1.2.Queries of Bounded WidthThe tremendous speed-up obtainable in the evaluation of acyclic queries stimulated several research efforts towards the identification of wider classes of queries having the same desirable properties as acyclic queries.These studies iden-tified a number of relevant classes of cyclic queries which are close to acyclic queries,because they can be decomposed via low width decompositions to acyclic queries.Thus,any such method M is characterized by some notion of M-width.We say that a query has bounded width according to M if its M-width is bounded by some fixed constant k.The main classes of polynomially solvable bounded-width queries considered in database theory are:•The queries of bounded treewidth[9](see also[29,19]).These are queries, whose variable-atom incidence graph has treewidth bounded by a constant.The treewidth of a graph is a well-known measure of its tree-likeness introduced by Robertson and Seymour in their work on graph minors[34].This notion plays a central role in algorithmic graph theory as well as in many subdisciplines ofComputer Science.We omit a formal definition.It is well-known that checking that a graph has treewidth at most k for a fixed constant k,and in the positive case, computing a k-width tree decomposition is feasible in linear time[6].In[29], another notion of treewidth of a query has been considered.This notion is equiva-lent to the treewidth of the Gaifman graph of the query,i.e.,the graph linking two variables by an edge if they occur together in a query-atom.•Queries of bounded degree of cyclicity[26,25].This is an interesting class of queries which encompasses the class of acyclic queries.(See[26,25]for a formal definition.)For each constant k,checking whether a query has degree of cyclicity at most k is feasible in polynomial time[26,25].•Queries of bounded query-width[9].The notion of bounded query-width is based on the concept of query decomposition[9].Roughly,a query decomposition of a query Q consists of a tree each vertex of which is labeled by a set of atoms and/or variables.Each variable and atom induces a connected subtree(connected-ness condition).Each atom occurs in at least one label.The width of a query decomposition is the maximum of the cardinalities of its vertices.The query-width qw(Q)of Q is the minimum width over all its query decompositions.A formal definition is given in Section3.1;Fig.2shows a2-width query-decomposition for the cyclic query Q1of Example1.1.This class is the widest of the three classes: Each query of bounded treewidth or of bounded degree of cyclicity k has also bounded query-width k,but for some queries the converse does not hold[9,19].In fact,there are even classes of queries with bounded query-width but with unbounded treewidth and unbounded degree of cyclicity.Note,however,that no polynomial algorithm for checking whether a query has width at most k was known.Intuitively,a vertex of a k-width query decomposition stands for the natural join of (the relations of)its elements—the size of this join is O(n k),where n is the size of the input database.Once these joins have been done,the query decomposition can be treated exactly like a join tree of an acyclic query,and permits to evaluate the query in time polynomial in n k[9].This notion is a true generalization of the basic concept of acyclicity:A query is acyclic iff it has query-width1.The problem BCQ(evaluation of Boolean conjunctive queries)and the bounded query-width versions of all mentioned equivalent problems,e.g.query-containment Q1ıQ2,where the query-width of Q2is bounded,can be efficiently solved if ak-width query decomposition of the query is given as(additional)input.ChekuriandFIG.2.A2-width query decomposition of query Q1. 582GOTTLOB,LEONE,AND SCARCELLOHYPERTREE DECOMPOSITIONS AND TRACTABLE QUERIES583 Rajamaran provided a polynomial-time algorithm for this problem[9];Gottlob et al.[19]later pinpointed the precise complexity of the problem by proving its LOGCFL-completeness.1.3.A Negative ResultUnfortunately,unlike for acyclicity,for bounded treewidth,or for bounded degree of cyclicity,no efficient method for checking bounded query-width is known,and a k-width query decomposition,which is required for the efficient evaluation of a bounded-width query,is not known to be polynomial-time com-putable.Chekuri and Rajaraman[9]state this as an open problem.This problem is the first question we address in the present paper.The fact that treewidth k can be checked in linear time suggests that an analo-gous algorithm may work for query-width,too.Chekuri and Rajaraman[9]write:‘‘it would be useful to have an efficient algorithm that produces query decompositions of small width,analogous to the algorithm of Bodlaender[6]for decompositions of small treewidth.’’Kolaitis and Vardi[29]who also address this issue write:‘‘there is an important advantage of the concept of bounded treewidth over the concept of bounded query-width.Specifically,as seen above,the classes of structures of bounded treewidth are polynomially recognizable,whereas it is not known whether the same holds true for the classes of queries of bounded query-width.’’Unfortunately,there is bad news:Determining whether the query-width of a conjunctive query is at most4is NP-complete.The NP-completeness proof is rather involved.We give some intuition in Section3.3,and defer the technical proof to Section7.As shown in Section3.3, NP-hardness is intuitively due to the fact that the definition of query decomposition implicitly requires that certain sets of variables occurring in subtrees of the decom-position be precisely covered by query atoms.This requirement of precise covering is reminiscent of various covering problems known to be NP-complete.In fact,in our NP-completeness proof(given in Section7),we succeeded to reduce the problem of EXACT COVER BY3-SETS to the query-width problem.The proof led us to a better intuition about(i)why the problem is NP-complete,and(ii)how this could be redressed by adopting a different notion of width.1.4.Hypertree Decompositions:Positive ResultsTo circumvent the high complexity of query decompositions,we introduce a new concept of decomposition and its associated notion of width,which we call hyper-tree decomposition and hypertree-width,respectively.The definition of hypertree decomposition(see Section4)corresponds to a more liberal notion of‘‘covering,’’which is computationally tractable.584GOTTLOB,LEONE,AND SCARCELLOWe denote the query-width of a query by qw(Q)and its hypertree-width by hw(Q).We shall prove the following results:1.For each conjunctive query Q it holds that hw(Q)[qw(Q).2.There exist queries Q such that hw(Q)<qw(Q).3.For each fixed constant k,the problems of determining whether hw(Q)[k and of computing(in the positive case)a hypertree decomposition of width at most k are feasible in polynomial time.4.For fixed k,evaluating a Boolean conjunctive query Q with hw(Q)[k is feasible in polynomial time.5.The result of a(non-Boolean)conjunctive query Q of bounded hypertree-width can be computed in time polynomial in the combined size of the input instance and of the output relation.6.Tasks3and4are not only polynomial,but are highly parallelizable.In particular,for fixed k,checking whether hw(Q)[k is in the parallel complexity class LOGCFL;computing a hypertree decomposition of width k(if any)is in functional LOGCFL,i.e.,is feasible by a logspace transducer that uses an oracle in LOGCFL;evaluating Q where hw(Q)[k on a database is complete for LOGCFL under logspace reductions.Similar results hold for the equivalent problem of conjunctive query containment Q1ıQ2,where hw(Q2)[k,and for all other of the aforementioned equivalent problems.Let us comment on these results.By statements1and2,the concept of hypertree-width is a proper generalization of the notion of query width.By statement3, bounded hypertree-width is efficiently checkable,and by statement4,queries of bounded hypertree-width can be efficiently evaluated.In summary,this is truly good news.It means that the notion of bounded hypertree-width not only shares the desirable properties of bounded query-width,it also does not share the bad properties of the latter,and,in addition,is a more general concept.It thus turns out that the high complexity of determining bounded query-width is not,as one would usually expect,the price for the generality of the concept.Rather, it is due to some peculiarity in its definition related to the exact covering paradigm. In the definition of hypertree width we succeeded to eliminate these problems without paying any additional charge,i.e.,hypertree-width comes as a freebie! Furthermore,Statement6asserts that the main algorithmic tasks related to bounded hypertree-width are in the very low complexity class LOGCFL,and thus are highly parallelizable.(See Section2.2).The definitions of hypertree decomposition and hypertree width given below(in Section4)are quite technical.However,in a recent paper[23],we were able to give extremely natural characterizations of the classes of queries(or hypergraphs)of bounded hypertree width,both in terms of games and in terms of suitable frag-ments of first order logic.From the results in[23],it follows that the concept of hypertree decomposition is a natural generalization of the concept of tree decom-position[34](which is defined for graphs only)to hypergraphs.1.5.Structure of the PaperThe rest of this paper is structured as follows.In Section2,we give some basic notions of database and complexity theory.In Section3,we formally define the query decompositions and provide some intuition on why finding(even small) query decompositions is NP-hard.The new notions of hypertree decomposition and hypertree-width are formally defined in Section4,where also some examples are given,and it is shown that queries having bounded hypertree-width are efficiently evaluable.In Section5,we present the alternating algorithm k-decomp that checks whether a query has hypertree-width at most k,where k is a fixed constant.This algorithm is shown to run on a logspace ATM having polynomially-sized accepting computation-trees,thus the problem is actually in LOGCFL.In Section6,hyper-tree decomposition is compared to related notions and,in particular,it is shown that hypertree decomposition properly generalizes the notion of query decomposi-tion.In Section7we give the full NP-completeness proof of the problem of decid-ing bounded query-width.This paper has two appendices.In Appendix1we show how the concepts of hypertree decomposition and hypertree width can be defined for hypergraphs rather than for conjunctive queries,and we show how the two settings are related.In Appendix2,we present a deterministic polynomial time algorithm(in form of a Datalog program)for checking whether a query has hypertree width at most k. Moreover,we maintain the hypertree decompositions’homepage[36],contain-ing further information and a download section with a program for computing hypertree decompositions and other useful tools.2.PRELIMINARIES2.1.Databases and QueriesFor a background on databases,conjunctive queries,etc.,see[40,1,30].We define only the most relevant concepts here.A relation schema R consists of a name(name of the relation)r and a finite ordered list of attributes.To each attribute A of the schema,a countable domain Dom(A)of atomic values is associated.A relation instance(or simply,a relation) over schema R=(A1,...,A k)is a finite subset of the cartesian product Dom(A1)×···×Dom(A k).The elements of relations are called tuples.A database schema DS consists of a finite set of relation schemas.A database instance,or simply database,DB over database schema DS={R1,...,R m}consists of relation instances r1,...,r m for the schemas R1,...,R m,respectively,and a finite universeUı1Ri(A i1,...,A i ki)¥DS(Dom(A i1)2···2Dom(A ik i))such that all data values occur-ring in DB are from U.In this paper we will adopt the standard convention[1,40]of identifying a relational database instance with a logical theory consisting of ground facts.Thus, a tuple O a1,...a k P,belonging to relation r,will be identified with the ground atom r(a1,...,a k).The fact that a tuple O a1,...,a k P belongs to relation r of a databaseinstance DB is thus simply denoted by r(a1,...,a k)¥DB.HYPERTREE DECOMPOSITIONS AND TRACTABLE QUERIES585A(rule based)conjunctive query Q on a database schema DS={R1,...,R m} consists of a rule of the formQ:ans(u)P r1(u1)N···N r n(u n),where n\0,r1,...,r n are relation names(not necessarily distinct)of DS;ans is a relation name not in DS;and u,u1,...,u n are lists of terms(i.e.,variables or con-stants)of appropriate length.The set of variables occurring in Q is denoted by var(Q).The set of atoms contained in the body of Q is referred to as atoms(Q). Similarly,for any atom A¥atoms(Q),var(A)denotes the set of variables occurring in A;and for a set of atoms Rıatoms(Q),define var(R)=1A¥R var(A).The answer of Q on a database instance DB with associated universe U,consists of a relation ans whose arity is equal to the length of u,defined as follows.ans con-tains all tuples ans(u)J such that J:var(Q)0U is a substitution replacing each variable in var(Q)by a value of U and such that for1[i[n,r i(u i)J¥DB.(For an atom A,A J denotes the atom obtained from A by uniformly substituting J(X) for each variable X occurring in A.)The conjunctive query Q is a Boolean conjunctive query(BCQ)if its head atom ans(u)does not contain variables and is thus a purely propositional atom.Q evaluates to true if there exists a substitution J such that for1[i[n,r i(u i)J¥DB; otherwise the query evaluates to false.The head literal in Boolean conjunctive queries is actually inessential,therefore we may omit it when specifying a Boolean conjunctive query.Note that conjunctive queries as defined here correspond to conjunctive queries in the more classical setting of relational calculus,as well as to SELECT-PROJECT-JOIN queries in the setting of relational algebra,or to simple SQL queries of the typeSELECT Ri1.A j1,...R ik.A jkFROM R1,...R n WHERE cond,such that cond is a conjunction of conditions of the form R i.A=R j.B or R i.A=c, where c is a constant.A query Q is acyclic[3,4]if its associated hypergraph H(Q)is acyclic,otherwise Q is cyclic.The vertices of H(Q)are the variables occurring in Q.Denote by atoms(Q)the set of atoms in the body of Q,and by var(A)the variables occurring in any atom A¥atoms(Q).The hyperedges of H(Q)consist of all sets var(A),such that A¥atoms(Q).We refer to the standard notion of cyclicity/acyclicity in hypergraphs used in database theory[30,40,1].A join tree JT(Q)for a conjunctive query Q is a tree whose vertices are the atoms in the body of Q such that whenever the same variable X occurs in two atoms A1 and A2,then A1and A2are connected in JT(Q),and X occurs in each atom on the unique path linking A1and A2.In other words,the set of nodes in which X occurs induces a(connected)subtree of JT(Q)(connectedness condition).Acyclic queries can be characterized in terms of join trees:A query Q is acyclic iff it has a join tree[4,3].586GOTTLOB,LEONE,AND SCARCELLOFIG.3.A join tree of Q 3.Example 2.1.While query Q 1of example 1.1is cyclic and admits no join tree,query Q 2is acyclic.A join tree for Q 2is shown in Fig.1.Consider the following query Q 3:ans P r(Y,Z)N g(X,Y)N s(Y,Z,U)N s(Z,U,W)N t(Y,Z)N t(Z,U).A join tree for Q 3is shown in Fig.3.Acyclic conjunctive queries have highly desirable computational properties:1.The problem BCQ of evaluating a Boolean conjunctive query can be effi-ciently solved if the input query is acyclic.Yannakakis provided a (sequential)polynomial time algorithm solving BCQ on acyclic conjunctive queries [43].2The 2Note that,since both the database DB and the query Q are part of an input-instance of BCQ,what we are considering is the combined complexity of the query [43].authors of the present paper have recently shown that BCQ is highly parallelizable on acyclic queries,as it is complete for the low complexity class LOGCFL [19].2.Acyclicity is efficiently recognizable,and a join tree of an acyclic query is efficiently computable.A linear-time algorithm for computing a join tree is shown in [39];an L SL method has been provided in [19].3.The result of a (non-Boolean)acyclic conjunctive query Q can be computed in time polynomial in the combined size of the input instance and of the output relation [44].Acyclicity is a key-property responsible for the polynomial solvability of problems that are in general NP-hard such as BCQ [8]and other equivalent problems such as Conjunctive Query Containment [33,9],Clause Subsumption,and Constraint Satisfaction [29,19].(For a survey and detailed treatment see [19].)2.2.The Class LOGCFLLOGCFL consists of all decision problems that are logspace reducible to a context-free language.An obvious example of a problem complete for LOGCFL is Greibach’s hardest context-free language [24].There are a number of veryHYPERTREE DECOMPOSITIONS AND TRACTABLE QUERIES 587588GOTTLOB,LEONE,AND SCARCELLOinteresting natural problems known to be LOGCFL-complete(see,e.g.[19,38, 37]).The relationship between LOGCFL and other well-known complexity classes is summarized in the following chain of inclusions:AC0ıNC1ıLıSLıNLıLOGCFLıAC1ıNC2ıPıNPHere L denotes logspace,AC i and NC i are logspace-uniform classes based on the corresponding types of Boolean circuits,SL denotes symmetric logspace,NL denotes nondeterministic logspace,P is polynomial time,and NP is nondeterminis-tic polynomial time.For the definitions of all these classes,and for references con-cerning their mutual relationships,see[28].Since—as mentioned in the introduction—LOGCFLıAC1ıNC2,the problems in LOGCFL are all highly parallelizable.In fact,they are solvable in logarithmic time by a concurrent-read-concurrent-write(CRCW)parallel random-access-machine(PRAM)with a polynomial number of processors,or in log2-time by an exclusive-read-exclusive-write(EREW)PRAM with a polynomial number of pro-cessors.In this paper,we will use an important characterization of LOGCFL by Alter-nating Turing Machines.We assume that the reader is familiar with the alternating Turing machine(ATM)computational model introduced by Chandra et al.[7]. Here we assume without loss of generality that the states of an ATM are parti-tioned into existential and universal states.As in[35],we define a computation tree of an ATM M on an input string w as a tree whose nodes are labeled with configurations of M on w,such that the descen-dants of any non-leaf labeled by a universal(existential)configuration include all (resp.one)of the successors of that configuration.A computation tree is accepting if the root is labeled with the initial configuration,and all the leaves are accepting configurations.Thus,an accepting tree yields a certificate that the input is accepted.A complex-ity measure considered by Ruzzo[35]for the alternating Turing machine is the tree-size,i.e.the minimal size of an accepting computation tree.Definition 2.2[35].A decision problem P is solved by an alternating Turing machine M within simultaneous tree-size and space bounds Z(n)and S(n)if,for every‘‘yes’’instance w of P,there is at least one accepting computation tree for M on w of size(number of nodes)[Z(n),each node of which represents a configura-tion using space[S(n),where n is the size of w.(Further,for any‘‘no’’instance w of P there is no accepting computation tree for M.)Ruzzo[35]proved the following important characterization of LOGCFL:Proposition 2.3[35].LOGCFL coincides with the class of all decision problems recognized by ATMs operating simultaneously in tree-size O(n O(1))and space O(log n).3.QUERY DECOMPOSITIONSIn this section,we first give the formal definitions of query-width and query decomposition.Then,we provide some intuition of why deciding whether a query has bounded query-width is NP-hard.3.1.Bounded Query-Width and Bounded Query-DecompositionsThe following definition of query decomposition is a slight modification of the original definition given by Chekuri and Rajaraman[9].Our definition is a bit more liberal because,for any conjunctive query Q,we do not care about the atom head(Q),as well as of the constants possibly occurring in Q.However,in this paper,we will only deal with Boolean conjunctive queries without constants,for which the two notions coincide.Definition 3.1.A query decomposition of a conjunctive query Q is a pair O T,l P,where T=(N,E)is a tree,and l is a labeling function which associates to each vertex p¥N a set l(p)ı(atoms(Q)2var(Q)),such that the following conditions are satisfied:1.for each atom A of Q,there exists p¥N such that A¥l(p);2.for each atom A of Q,the set{p¥N|A¥l(p)}induces a(connected) subtree of T;3.for each variable Y¥var(Q),the set{p¥N|Y¥l(p)}2{p¥N|Y occurs in some atom A¥l(p)}induces a(connected)subtree of T.The width of the query decomposition O T,l P is max p¥N|l(p)|.The query-width qw(Q)of Q is the minimum width over all its query decompositions.A query decomposition for Q is pure if,for each vertex p¥N,l(p)ıatoms(Q).Note that Condition3above is the analogue of the connectedness condition of join trees and thus we will refer to it as the Connectedness Condition,as well.Example 3.2.Figure2shows a2-width query decomposition for the cyclic query of Example1.1.Consider the following query Q4:ans P s(Y,Z,U)N g(X,Y)N t(Z,X)N s(Z,W,X)N t(Y,Z)Q4is a cyclic query,and its query-width equals2.A2-width decomposition of Q4is shown in Fig.4.Note that this query decomposition is pure.The next proposition,which is proved elsewhere[19],shows that we can focus our attention on pure query decompositions.Proposition 3.3[19].Let Q be a conjunctive query and O T,l P a c-width query decomposition of Q.Then。

相关文档
最新文档