16s 宏基因组分析

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Various categories explain different proportion o microbial variations (but still limited)
* Additive model (linear vs non-linear)
Falony*, Joossens*, Viera-Silva*, Wang* et al, Science 2016
• gam
0.06
0.73 27.29
15.52
47.50 47.50
• dss
0.07
0.53 17.94
3.72 31.22 78.72
• BEST analysis / BIOENV (finding combination of hypothesis for best correlation to distance, non-linear) vs. additive (linear model) in RDA/CCA
• IMPORTANT: standardize sequencing depth before clustering and making compositional matrix, not after—it changes the distribution and diversity estimates
1
Alfa-diversity: observed vs hidden
Sampling procedure are rarely exhaustive/comprehensive Observed number of taxa: dependent on sampling depth Richness/Diversity: extrapolate on rarefaction curve Evenness: relative proportion of each other
“Effect sizes” from ordination are similar to distance-based, but not entirely the same (some information is lost)
Honorable mentions
• SIMPER analysis (much older than Lefese)
Case study: Dissecting genetic basis of microbiome variations
Biobank of >2000 individuals Collaboration between Bonn, Kiel, Oslo, Ploen
Again, diet is important confounder—and must consider in genetic studies Wang*, Ithingholm*, Skiecevičienė* et al Nature Genetics, accepted
0.006
MAF40
Frequency 0.0e+00 5.0e+06 1.0e+07 1.5e+07
MAF20
0.000
0.002
0.004
perm_result[i, ]
0.006
MAF50
Frequency 0.0e+00 5.0e+06 1.0e+07 1.5e+07
Frequency 0.0e+00 5.0e+06 1.0e+07 1.5e+07
Real life examples (+ Sample size vs. power)
Case study: Data mining and environment association in Flemish Gut
1106 individuals, rich metadata Methodologically challenged
Similarity
Clustering: lowering dimensions to 1
Group average
Resemblance: S17 Bray Curtis similarity
20
40
60
80
100
Samples
Lowering dimensions of complex dissimilarity index to 1 Cautious when drawing conclusions from clusters)
H0: things occur randomly (and the separation will be so…) H1: no they don’t (if the separation is big enough)
Fitting continuous variables: linear way (“envfit”) and surface way (“ordisurf”)
Clarke & Warwick 1999, “taxonomic distinctness“ * Of course, now the fancy name is “Unifrac”
Rarefaction result
Sequencing depth = 5000
Sequencing depth = 1000
Methodology: taxonomical and functional profiling
Morgan & Huttenhower, 2012
--16S rRNA profiling of gut microbial communities (for this workshop)
--Shotgun metagenomics for functional profiling (Ilumina Hiseq)
“Biodiversity on one dimension”
Margalef index Shannon-wienner Simpson
Community comparison Hill’s index N0=S N1=exp (H’) N2=1/D N∞= 1/p1 N10, N10’, N21, N21’
Ordination: PCA/PCoA (lowering to dimension to 2)
X3
PC 1
X
PC 2
X1
X2
Principle component analysis: components = compositional matrix; similarly for CCA/RDA Principle coordinates (Co) analysis: based on dissimilarity (end up in coordinates)
Defining genome-wide significant loci for beta-diversity
Can permute 10e8 for each locus— only a few life’s time
Or: pre-impute significance threshold for different MAF and then just check for effect size
Association of Age to PC1
20 40 60 80
20 40 60 80
-1.0 -0.5 0.0 0.5 1.0 1.5 2.0 summary(genera_cap)$sites[, 1]
Association of Age to PC2
-1.0 -0.5 0.0 0.5 1.0 1.5 summary(genera_cap)$sites[, 2]
Falony*, Joossens*, Viera-Silva*, Wang* et al, Science 2016 Zhernakova et al, Science 2016
Microbiome in larger populations—transiting from cluster to landscape
Frequency 0.0e+00 5.0e+06 1.0e+07 1.5e+07
0.000
0.002
0.004
perm_result[i, ]
0.006
0.000
0.002
0.004
perm_result[i, ]
0.006
0.000
0.002
0.004
0.006
perm_result[i, ]
Wang*, Ithingholm*, Skiecevičienė* et al Nature Genetics, accepted
Taxa contributing to similarity within group/dissimilarity between groups
Between Group sed & Group water
• Species Av.Abund Av.Abund Av.Diss Diss/SD Contrib% Cum.%
--Alpha-diversity: the richness of bacterial taxa within a community (i.e. how complex a community is)
--Beta-diversity: the shared and unique bacterial taxa between communities (i.e. how different communities are)
Easy to define, hard to
Etc.
compare
Beta-diversity: paired community comparison • Bary-Curtis
Or jaccard (based on presence/absence)
The previous measures ignore phylogeny of taxa in question (i.e. equal weight for close/remote taxa) Some measures give different weight on phylogenetically different pairs
shore0 water1m sed1sw shoreanox shore10 sed1wsw
Linkage algorithm: initiating stage of clustering (that made most of the differences in defining the next nearest branch)
Falony*, Joossens*, Viera-Silva*, Wang* et al, Science 2016
Identified major contributing factors of microbiome variations and replicated in nearby cohort (>90% replication rate)
Genetics and env源自文库ronment each accounts for 10% of total variations Large amount of loci, each with relatively small effect size
Frequency 0e+00 1e+07 2e+07 3e+07
MAF5
0.000
0.002 0.004 0.006 perm_result[i, ]
0.008
MAF30
Frequency 0.0e+00 5.0e+06 1.0e+07 1.5e+07
MAF10
0.000
0.002
0.004
perm_result[i, ]
Hypothesis testing: MANOVA and permutation based significance
It’s really a F-test (MANOVA=“multi-variate” ANOVA): if between group differences > within groups, it’s likely those groups have an effect
相关文档
最新文档