基因差异表达(Differential Gene Expression)

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Therefore: Invest money in repeated experiments!
A
B
Standard Deviation and Standard Error
Standard Deviation (SD): Variability of the measurement Standard Error (SE): Variability of the mean of several measurements
A
induced gene?
B
Actually this depends on the within class variability of the two genes
again, it can be the other way
round.
Is a three-fold induced gene more trust worthy than a two-fold induced gene?
T?
T?
T-Test PROBLEMS
• There are many genes (-> tests) but only few repetitions • is „using s“ a estimate good? • if measured variance is small, T becomes easily very large
higher expression in profile A than
A
in profile B.
Is two-fold trust worthy?
B
Well, by how much can this gene
change in group A and in group B?
By no more than 10% than the
You need to score differential gene expression Different scores lead to different rankings What scores are there?
Fold Change & Log Ratios
You have transformed your data to additive scale! Factors become differences:
Differential gene expression:
Which genes are differentially expressed in the two tissue type populations?
A cost efficient (cheap) experiment:
We observe a gene with a two-fold
Ranking means finding the right genes … drawing our attention to them
In many applications it is the most important step
Ranking is not Testing
Ranking: Finding the right genes
Testing: Deciding whether genes are significant
There is more then one way to rank
There is more then one way to test
The criteria for which ranking is best is different from the criteria which test is best … power is often no argument
-> Statistical Analysis
Ranking:
Problem: Produce an ordered list of differentially expressed genes starting with the most up regulated gene and ending with the most down regulated gene
Gene, candidate 1 Gene, candidate 2 Gene, candidate 3 Gene, candidate 4 Gene, candidate 5 Gene, candidate 6 Gene, candidate 7 Gene, candidate 8 Gene, candidate 9 Gene, ....
Ranking: Order Genes due to amount of fold change/Score -> maybe some that are not differential in reality (False Positive)
Gene, candidate 1 Gene, candidate 2 Gene, candidate 3 Gene, candidate 4 Gene, candidate 5 Gene, candidate 6 Gene, candidate 7 Gene, candidate 8 Gene, candidate 9 Gene, ....
->Limma
->SAM
->Twห้องสมุดไป่ตู้light
More Scores:
- Wilcoxon Score (robust)
- PAUc Score (separation)
- paired t-Score (paired Data)
- F-Score (more then 2 conditions)
- Correlation to a reference gene - etc etc
n Replications Normal Distributed Data:
Conclusion: Repetitions lead to a more precise measurement of gene expression. Single expression measurements are very noisy, average expression across several repetitions is much less noisy
Conclusion: Transform your data to the additive
scale -Simple way: take logs
Reminder:
Questions:
Which genes are differentially expressed?
-> Ranking
Are these results „significant“
The denominator in T becomes really small
Constantly expressed genes show up on top of the list
Correction: Add a constant fudge factor s0
Regularized T-score
Product intensity (logscale)
A
B
Conclusion: In addition to the differences in gene expression you also have a vital interest in its variability ... This information is needed to obtain meaningful lists of genes
Differential Gene Expression
Patients, Samples, Timepoints ...
Genes
Two cell/tissue /disease types:
wild-type / mutant control / treated disease A / disease B responding / non responding etc. etc.... For every sample (cell line/patient) we have the expression levels of thousands of genes and the information whether it is A or B
Change: high Variance: low
Change: high Variance:high
Change: HIGH Variance: SMALL
Change: SMALL Variance: HIGH
T huge
T~0
Change: HIGH Variance: HIGH
Change: SMALL Variance: SMALL
Therefore: for microarray it is reasonable to use a modified version of the Ttest
Fudge Factors:
You need to estimate the variance from data
You might underestimate an already small variance (constantly expressed genes)
Therefore: Invest money in repeated experiments!
A
B
The additive scale:
You will want to use the wealth of statistical theory to analyze your data - Most statistics works on an additive scale (Significance of differences etc ...) - Gene expression works on a multiplicative scale (fold changes ...)
answer is yes, by up to 500% then
the answer is no.
A cost efficient (cheap) experiment II:
Is a three-fold induced gene more
trust worthy than a two-fold
Order due to some score, Intuitively: Fold change
1st: most differential, 2nd: second most diff ...
Testing: Find Genes due to amount of fold change/Score which are significant s.t. there are less than 5% False Positives -> maybe you miss some (False Negatives)
Different scores give different rankings
ALL vs AML (Golub et al.)
Which Score is the best one?
Order due to some score, Frequently T Score
1st: most differential, 2nd: second most diff ...
Which gene is more differentially expressed?
Ranking is Scoring
If you want to rank by fold change you compute the average expression in both groups and subtract them.
T-Score
Idea: Take variances into account
Change: low Variance: high
相关文档
最新文档