分子进化树构建方法

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

MP法建树流程
Sequence1 Sequence2 Sequence3
Sequence4
Position 1
Position 1 2 3 T G C T A C A G G A A G
If 1 and 2 are grouped a total of four changes are needed.
5
genetic change
系统发生树术语
Rooted tree vs. Unrooted tree
无 A 有 根 根 树 B 树 two major ways to root trees:
A
10 3 2 5
C D
By midpoint or distance
d (A,D) = 10 + 3 + 5 = 18 Midpoint = 18 / 2 = 9
Distance Uses only pairwise distances Minimizes distance between nearest neighbors Very fast Easily trapped in local optima Good for generating tentative tree, or choosing among multiple trees Maximum parsimony Uses only shared derived characters Minimizes total distance Maximum likelihood Uses all data Maximizes tree likelihood given specific parameter values Very slow Highly dependent on assumed evolution model Good for very small data sets and for testing trees built using other methods
C B
2
D
outgroup
外群、外围支
Rooted tree vs. Unrooted tree
plant animal
plant
plant animal
Unrooted tree
fungus
animal
bacterium
plant plant plant
animal
Rooted tree
Monophyletic group
最大简约法 (Maximum Parsimony)
最大简约法(MP)最早源于形态性状研究, 现在已经推广到分子序列的进化分析中。最大 简约法的理论基础是奥卡姆(Ockham)哲学 原则,对所有可能的拓扑结构进行计算,找出 所需替代数最小的那个拓扑结构,作为最优 树。
Find the tree that explains the observed sequences with a minimal number of substitutions
邻近法 (Neighbor-joining, NJ)
最小进化法 (minimum evolution)
建立进化树
进化树评估
统计分析 Bootstrap Likelihood Ratio Test ……
距离法
距离法又称距离矩阵法,首先通过各个序列 之间的比较,根据一定的假设(进化距离模型) 推导得出分类群之间的进化距离,构建一个进化 距离矩阵。进化树的构建则是基于这个矩阵中的 进化距离关系 。
A
节点 Node
பைடு நூலகம்
祖先节点/树 根
Root
内部节点/分歧点
该分支可能的祖先 HTU
系统发生树术语
A clade(进化支) is a group of organisms that includes an ancestor and all descendents of that ancestor. 分支树
Cat Dog Rat Cow 3 4 6 5 7 6 Dog Dog Rat Cat
1
2 2 1 4
计算序 列的距 离,建 立距离 矩阵
Rat
通过距 离矩阵 建进化 树
Cow
Step1. 计算序列的距离,建立距离矩阵
对位排列, 去除空格 (选择替代模型)
Uncorrected “p” distance (=observed percent sequence difference) Kimura 2-parameter distance (estimate of the true number of substitutions between taxa)
(1,2): 1 change; (1,3) or (1,4): 2 changes (1,3): 1 change; (1,2) or (1,4): 2 changes (1,2): 1 change; (1,3) or (1,4): 2 changes
Position 2
If 1 and 3 are grouped a total of five changes are needed.
If 1 and 4 are grouped a total of six changes are needed.
Position 3
MP法建树步骤
4 BEST
5
6
最大似然法 (Maximum Likelihood)
最大似然法(ML) 最早应用于对基因频率数据的 分析上。其原理为选取一个特定的替代模型来分 析给定的一组序列数据,使得获得的每一个拓扑 结构的似然率都为最大值,然后再挑出其中似然 率最大的拓扑结构作为最优树。
C
AT GC
C
A
AT GC
G ML法建 树流程
Inferring the maximum likelihood tree
• Pick an Evolutionary Model • For each position, Generate all possible tree structures • Based on the Evolutionary Model, calculate Likelihood of these Trees and Sum them to get the Column Likelihood for each OTU cluster. • Calculate Tree Likelihood by multiplying the likelihood for each position • Choose Tree with Greatest Likelihood
系统发育树构建步骤多序列比对自动比对手工校正选择建树方法替代模型建立进化树进化树评估最大简约法maximumparsimonymp距离法distance最大似然法maximumlikelihoodmlbayesianinference统计分析bootstraplikelihoodratiotestupgma邻近法neighborjoiningnj最小进化法minimumevolution距离法距离法又称距离矩阵法首先通过各个序列之间的比较根据一定的假设进化距离模型推导得出分类群之间的进化距离构建一个进化距离矩阵
构建进化树的新方法——贝叶斯推断 (Bayesian inference)
Holder&Lewis (2003) Nature Reviews Genetics 4, 275-284 Maximum Likelihood: Bayesian inference:
What is the probability of seeing the observed data (D) given a model/theory (T)?
Slow Assumptions fail when evolution is rapid Best option when tractable (<30 taxa, homoplasy rare)
Choosing a Method for Phylogenetic Prediction
Molecular Biology and Evolution 2005 22(3):792-802
Easy
only with substitutions
Difficult
also with indels
系统发生树术语
分支 Branch
末端节点 可以是物种, B 群体,或者蛋 C 白质、DNA 、RNA分子 D 等 OTU
E = ((A, (B,C)), (D, E))
Newick format
Pr(D|T)
What is the probability that the model/theory is correct given the observed data?
Pr(T|D)
与ML相比,BI 的优势:
•Speed •No need for bootstrapping
Comparison of Methods
Phylogenetic reconstruction is a problem of statistical inference. One must assess the reliability of the inferred phylogeny and its component parts. Questions: (1) how reliable is the tree? (2) which parts of the tree are reliable? (3) is this tree significantly better than another one?
Bioinformatics: Sequence and Genome Analysis, 2nd edition, by David W. Mount. p254 /cgi/content/full/2008/5/pdb.ip49
Assessing tree reliability
Step2. 通过矩阵建树 由进化距离构建进化树的方法有很多,常见有:
1. Unweighted Pair Group Method with Arithmetic mean (UPGMA)
2. Neighbor-Joining Method (NJ法/邻位连接法) 3.Minimum Evolution (MP法/最小进化法)
研究系统发生的方法
经典进化生物学:
比较:形态、生理结构、化石
分子进化生物学:
比较DNA和蛋白质序列
An Alignment is an hypothesis of positional homology between bases/Amino Acids
Residues that are lined up in different sequences are considered to share a common ancestry (i.e., they are derived from a common ancestral residue).
选择一个或多个已知与分析序列关系较远的序列作 为外类群 外类群可以辅助定位树根 外类群序列必须与进化树上其它序列同 源,但外 类群序列与这些序列间的差异必须比这些序列之间的 差异更显著。
系统发育树构建步骤
多序列比对(自动比对、手工校正)
最大简约法 (maximum parsimony, MP) 距离法 选择建树方法(替代模型) (distance) 最大似然法 (maximum likelihood, ML) 贝叶斯法 (Bayesian inference) UPGMA
生物信息学
第五章
系谱分析
2. 系统发生分析(Phylogenetic analysis) 分析基因或蛋白质的进化关系
系统发生(进化)树(phylogenetic tree)
A tree showing the evolutionary relationships among various biological species or other entities that are believed to have a common ancestor.
Cladogram
Taxon B Taxon C Taxon A
3 1
Phylogram
6
1 1
进化树
Ultrametric tree
Taxon B Taxon B Taxon C Taxon A Taxon D
time
超度量树
Taxon C
Taxon A Taxon D
Taxon D
no meaning
animal
root
animal fungus
Monophyletic group
How to root a tree?
bacteria outgroup
archaea archaea archaea eukaryote
外群
选择外群 (Outgroup)
eukaryote
eukaryote eukaryote
相关文档
最新文档