进化树构建
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Approximate likelihood-ratio test (aLRT )
used by PhyML
Posterior probability
used by MrBayes
Resampling
—Bootstrap (extensively used) —Jackknife (used by PAUP*)
Model Selection
Ignoring or underestimating multiple substitution can lead to a phenomenon called long long-branch attraction.
Tree Building methods
(1)matches = the same nucleotide appears in both sequences.
(2)mismatches = different nucleotides are found in the two sequences. (3)gaps = a base in one sequence and a null base in the other.
Data 1.TGCA 2.TACC 3.AGGT 4.AAGT
Data 1.TGC 2.TAC 3.AGG 4.AAG
4 5 6
Thomas Bayes (1701–1761)
Assessing reliability of trees (measures of clade support)
Unrooted
bacteria outgroup
Rooted by outgroup root
Monophyletic group
Monophyletic group
Jackknifing
Drop one or several observation at a time
Create subsets of available data Calculate the subset estimate each time
Get one tree from each sub dataset and summarize all these trees into a consensus tree
Rooted /Unrooted trees and Outgroup
What is an outgroup?
An outgroup is a (monophyletic) group of organisms that serves as a reference group for determination of the evolutionary relationship among three or more monophyletic groups of organisms.
Monophyletic group is a taxon (group of organisms) which forms a clade, meaning that it consists of a species and all its descendants.
Rooted /Unrooted trees and Outgroup
Four-Point Condition
d(A,B)+d(C,D)<d(A,C)+d(B,D)=d(A,D)+d(B,C)
Neighbor joining method
A saFra Baidu bibliotekple of Maximum parsimony
Step 1. Identify all the informative sites. Step 2. For each possible tree, calculate the minimum number of substitutions at each informative site. Step 3. Sum up the number of changes over all the informative sites for each possible tree. Step 4. Choose the tree associated with the smallest number of changes as the maximum parsimony tree.
进化树构建
Outline
Alignment
Model Selection
Tree Building methods
Assessing reliability of trees Rooted /Unrooted trees and Outgroup
Alignment
An alignment consists of a series of paired bases, one base from each sequence. There are three types of pairs:
Neighbor joining method
If we combine OTUs A and B into one composite OTU, then the composite OTU (AB) and the simple OTU C become neighbors.
Neighbor joining method
Alignment
Sequence alignment 1. Pairwise alignment (BLAST, FASTA) 2. Multiple alignment (Clustal W/X, MUSCLE, T-coffee etc.)
Methods of alignment: 1. Manual 2. Word match (pairwise alignment: BLAST) 3. Dot matrix (pairwise alignment: FASTA) 4. Distance Matrix (multiple alignment) 5. Combined (Distance + Manual)
Model Selection
Substitution models for nucleotide sequence
Jukes-Cantor (JC, nst=1): Equal base frequencies, all substitutions equally likely (Jukes and Cantor 1969) Felsenstein 1981 (F81, nst=1): Variable base frequencies, all substitutions equally likely (Felsenstein 1981) Kimura 2-parameter (K80, nst=2): Equal base frequencies, variable transition and transversion frequencies (Kimura 1980) Hasegawa-Kishino-Yano (HKY, nst=2): Variable base frequencies, variable transition and transversion frequencies (Hasegawa et. al. 1985) Tamura-Nei (TrN): Variable base frequencies, equal transversion frequencies, variable transition frequencies (Tamura Nei 1993) Kimura 3-parameter (K3P): Variable base frequencies, equal transition frequencies, variable transversion frequencies (Kimura 1981) Transition Model (TIM): Variable base frequencies, variable transitions, transversions equal Transversion Model (TVM): Variable base frequencies, variable transversions, transitions equal Symmetrical Model (SYM): Equal base frequencies, symmetrical substitution matrix (A to T = T to A) General Time Reversible (GTR, nst=6): Variable base frequencies, symmetrical substitution matrix (Lanave et al. 1984, Tavare 1986, Rodriguez et. al. 1990)
Model Selection
Sequence divergence is roughly linear with time only shortly after a divergence event. To estimate sequence divergence soundly (not accurately or correctly), we need to correct sequence difference with models.
Distance method - Neighbor joining - Minimum evolution - UPGMA - Fitch-Margoliash Maximum parsimony (model free) Maximum likelihood (model dependent) Bayesian inference (model dependent)