生物信息学_基因表达分析
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Detected
Gene expression data analysis
• Microarray data analysis procedure
Intensity
Goal: make multiple arrays comparable
expression analysis
• Sources of variation between multiple highExpression density oligonucleotide arrays: profile • Biological • Disease VS. Control • Non-biological Quality control • Total RNA preparation, amplification • Sample labeling differences Normalization • Hybridization • Scanner differences Differential gene • Image analysis
Gene expression data analysis
Microarray data analysis procedure
Gene expression data analysis
• Microarray data analysis procedure
Intensity Expression profile Quality control Normalization Differential gene expression analysis
• Experimental design principle
• Replication
• Biological replicates
Sample1 Sample2 Sample3
Microarray1 Microarray2 Microarray3
• Technical replicates
Sample1
• • • • Normalization Hypothesis testing Multiple hypothesis testing False positive control
Background
Background
• Human Genome
– Publication of Initial Working Draft Sequence [February 12, 2001]
Experiment design
Gene expression data analysis
• Experimental design “To consult a statistician after an experiment is finished is often merely to ask him to conduct a post-mortem examination. He can perhaps say what the experiment died of.”
lncRNA microarray
LncRNA classification
19% Intergenic 4% 11% 8% 58% Divergent Intronic
Sense
Antisense
Sense
Rinn and Chang, 2012 Ruscio et al. 2013
Gene expression data analysis
Gene expression data analysis
• Microarray data analysis procedure
Intensity Expression profile Quality control Normalization Differential gene expression analysis
lncRNA microarray
Data sources of lncRNA microarray
Sources GENCODE/ENSEMBL Human LincRNA Catalog RefSeq UCSC NRED H-InvDB Enhancer-like lncRNA RNAdb Antisense ncRNA pipeline UCRs CombinedLit Hox ncRNAs snoRNA lncRNAdb ncRNAs from Chen lab Total Unique lncRNAs V1 4765 13521 1289 17203 2975 1053 481 529 389 78 848 42283 30,622 V2 12754 8195 4765 13521 1289 17203 2975 1053 481 529 407 389 78 848 63639 35,024 V3 22444 14353 4814 5596 13701 1038 3019 1599 1053 962 529 407 389 104 848 70856 37,491
Ronald A. Fisher: Indian Statistical Congress, 1938, vol. 4, p. 17
Carefully design your experiments before doing them!!!!!
Gene expression data analysis
Not randomized
randomized
Gene expression data analysis
• Experimental design principle
• Blocking
Control T1 Exp.1 Exp.2 Exp.3 RNA extracts: Day1 Day2 Day3 T2 Exp.1 Exp.2 Exp.3 Control T1 T2 RNA extracts: Day1 Day2 Day3
生物信息学
基因表达分析
陈小伟 chenxiaowei@moon.ibp.ac.cn 中国科学院生物物理研究所 2014.10.08
Gene Expression Analysis
• Background • Experimental techniques used to measure gene expression
• Blocking
• The process of identifying or building groups of EU which are expected to have similar responses in the absence of any treatment effects
Gene expression data analysis
Rinn and Chang, 2012
lncRNA microarray
LncRNA dataset
LncRNA datasets NONCODE GENCODE Human lincRNA Catalog lncRNAdb RefSeq UCSC Genes H-InvDB lncRNAs from HOX loci lncRNAs from ultraconserved regions lncRNA count 95,135 26,414 14,353 118 4,814 5,596 1,038 962 407
• RNA-seq (Illumina)
Long non-coding RNA microarray
lncRNA microarray
Systematic identification of lncRNAs
• High-throughput Sequencing – ChIP-Seq – CAGE-seq – 3P-seq – RNA-Seq
Microarray1 Microarray2 Microarray3
Gene expression data analysis
• Experimental design principle
• Randomization
• Each gene is spotted in quadruplicate
Isolate SAGE tags
Quantitate tags and determine patterns of gene expression
Sequencing
Link tags together
Experimental techniques
• DNA microarray
Experimental techniques
lncRNA microarray
LncRNA sequences are redundant
UCSC Genes
GENCODE
RefSeq Genes
• Combine lncRNAs with high similarity from different datasets • Contain isoforms of lncRNAs • Obtain a comprehensive dataset of lncRNAs, including 37,491 LncRNA
• ENCODE (Encyclopedia of DNA Elements)
– 74.7% of human genome covered by primary transcripts – 62.1% of human genome covered by processed transcripts – 2.94% of human genome covered by exons of proteincoding genes
• Experimental design principle
• Replication
• The process of applying each treatment to more than one experimental unit (EU)
ຫໍສະໝຸດ Baidu
• Randomization
• Randomly allocating treatments to EU, to ensure fair assessment of the treatments
ProbeName
Control
Tumor
RNA53314
RNA53313 RNA53312 RNA53311 RNA53310
3610.6355
330.27353 2991.578 46.673733 58.98197
7735.4663
230.98158 3540.922 19.396254 16.519632
Transcription
Experimental techniques used to measure gene expression
Experimental techniques
• SAGE (Serial Analysis of Gene Expression)
– Victor Velculescu – 1995, Johns Hopkins University
lncRNA microarray
Remove redundant lncRNA sequences
RefSeq UCSC H-InvDB ……
Xref & Sequence similarity & Genome loci
GENCODE 37,491 lncRNAs (V3) One specific probe for each lncRNA or its isoform
Gene expression data analysis
• Microarray data analysis procedure
Intensity Expression profile Quality control Normalization Differential gene expression analysis
– SAGE – DNA microarray – RNA-Seq
• Long non-coding RNA microarray • Gene expression data analysis
– Experiment design – Microarray data analysis procedure