高通量测序技术简介共31页文档
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Cycle 2-n: Add sequencing reagents and repeat
1、每轮测序反应加入四种带有荧光标记的dNTP,末端带有可 以被去除的阻断基团 2、每轮反应只能整合一个核苷酸,仪器读取相应的荧光信号 3、信号读取结束,用化学方法去除阻断基团,进行下一轮测序 反应
Base calling from the raw data T G C TAC GAT …
Sequenced short reads (typically 25–50 bp) from ChIP-Seq experiments are first mapped onto the reference genome. The mapped reads are then used to estimate statistical parameters, which include the estimation of the average length F of sequenced DNA fragments.
Reversible Terminator Chemistry 可逆终止反应
• All 4 labelled nucleotides in 1 reaction
O HN
ON
PPP
O
cleavage site fluor
O
X
5’
HN
DNA O N
O
O
3’
block
Incorporation Detection Deblock; fluor removal
1
2
3
4
5
6
7
8
9
TTTTTTTGT…
The identity of each base of a cluster is read off from sequential images 根据每个点每轮反应读取的荧光信号序列,转换成相 应的DNA序列
Solexa 测序 Workflow
ABI SOLiD 连接法测序
RPL29 Ribosomal protein L29 PRDX2 Peroxiredoxin 2 PFN2 Profilin 2 STMN4 Stathmin-like 4 COX5B Cytochrome c oxidase subunit Vb GABRA1 Gamma-aminobutyric acid (GABA) A receptor, alpha 1
Expression difference between MPM and ADCA sample compare to a lung tissue control
Analysis of percentage of reads containing known coding region SNVs in the six tissue samples.
454 sequencing: Emulsion PCR (emPCR)
A
+ PCR Reagents
B
Adapter carrying library DNA
+ Emulsion Oil
Mix DNA Library & capture beads (limited dilution)
Create “Water-in-oil”
novo sequencing 拼接带来困难
SOLiD sequencing
• 每个碱基读取两次非常高的准确性,特别是对于SNP的检测 • 灵活的系统,完善的磁珠编码系统,可以进行样品的pooling,分割测序区域 • 读取长度受连接反应的轮数限制,给de novo sequencing 拼接带来困难
454 sequencing: Deposition of DNA beads into the PicoTiter™Plate
Load beads into PicoTiter™Plate
Load Enzyme Beads
Centrifuge Step
Illumina Solexa 合成测序
Sequence by Synthesize 基本原理
Sequence by Ligation 基本原理
文库制备:微珠单分子克隆
SOLiD 利用探针的连接反应读取模板的DNA序列
1024种8碱基探针 4色荧光,4种双核苷酸,每色荧光有256个探针(4^6)
连接法测序 (一)
测序引物与adapter退火
探针连接,检测荧光 切除荧光基团
第二轮探针连接,检测荧光 切除荧光基团
malignant pleural mesotheliomas (MPMs) :恶性胸膜间皮瘤 pulmonary adenocarcinoma (ADCA):肺腺癌
Transcriptome charaLeabharlann Baiduteristics
Solid line: at least one read Dashed line:at least 20 reads
Roche 454 焦磷酸测序 Pyrophosphate Sequencing
Illumina Solexa 合成测序 Sequence by Synthesize
ABI SOLiD 连接法测序 Sequence by Ligation
Roche 454 焦磷酸测序
Pyrophosphate Sequencing 基本原理
Methy-seq(1): 肿瘤和MCF7细胞系中 BRCA!启动子区域的甲基化差异
Methy-seq(2):
Some highlights:
Correlation between ChIP-Seq and his prior SAGE-like method (called GMAT) has r=0.906
Prepare DNA fragments
Clonal Single Molecule Arrays 单分子克隆 Attach single molecules to surface
Amplify to form clusters
Ligate adapters
20 microns
Sequence
~1000 molecules per ~ 1 µm cluster ~1000 clusters per 100 µm square ~40 million clusters per experiment
3’ OH free 3’ end
Next cycle
3’ 5’
A
C G
T
C
A
T
G A
T
G
C
T G C T A C G A T A C C C G A T C G A T
5’
Sequencing-by-Synthesis (SBS)
Cycle 1: Add sequencing reagents First base incorporated Remove unincorporated bases Detect signal
高通量测序的应用
• De novo 测序 • 基因深度测序(genome re-sequencing) • 转录组深度测序(transcriptome re-sequencing) • Digital expression profiling • ChIP-seq • Methy-seq
Transcriptome resequencing:
第五部分:其他类型的芯片
微缩芯片实验室(Lab-on-a-chip) 药物运输芯片 生理功能辅助芯片 纳米芯片
第六部分 高通量测序技术简介 Next Generation Sequencing
• Sample fragmentation • Library preparation • Sequencing reaction • Data analysis
SNV: Single Nucleotide Substitution Variant
Digital expression profiling(1): 人大脑组织与UHR(Universal Human Reference)的表达差异
Tag Sequence GATCAAACCAAGGCCCAGGC GATCACTGTTAATGATTTGC GATCAGTGTCTTTTCAGCAC GATCATCATGACCAATGAAA GATCATGCTGGCTGCAAAGA GATCCAAACCCAAGTCTTGA GATCCAAGATAAAGAAGGCA GATCCCAGACTGGTTCTTGA GATCCCCAATTGACTCAGAG GATCCGGGGCTGCAGGCTTG GATCCTACAGAAGTGGAGCT GATCCTAGTAATTGCCTAGA GATCCTGCGGGAGTCTCCCG GATCCTGTGAAGGCCTGGAA GATCGAGACACGTGATGGGA GATCGAGGACAGTGCAACCA GATCTCAATGCCAATCCTCC GATCTGCACGCCGCTGACCC GATCTGTGCCCAGAGATGGG GATCTGTGGAGAATGTACAC
UBB Ubiquitin B RING1 Ring finger protein 1 DIRAS2 DIRAS family, GTP-binding RAS-like 2 PHF20 PHD finger protein 20
IGJ Immunoglobulin J polypeptide FN1 Fibronectin 1 NFIX Nuclear factor I/X (CCAAT-binding transcription factor) TOP2A Topoisomerase (DNA) II alpha 170kDa KRT8 Keratin 8 ITGB7 Integrin, beta 7 BASP1 Brain abundant, membrane attached signal protein 1 DRD1IP Dopamine receptor D1 interacting protein GFAP Glial fibrillary acidic protein APLP2 Amyloid beta (A4) precursor-like protein 2
Digital expression profiling & microRNA re-sequencing:
hESC: human embryonic stem cells EB: embryoid bodies
ChIP-seq(1): 人一号染色体DNA-蛋白相互作用
ChIP-seq(2):
454 sequencing
• 读取长度大,400bp • 可以对未知基因组进行从头测序de novo sequencing • 当遇到polymer时,如AAAAAA等,荧光强度和碱基个数不成线性关系,判定
重复碱基个数有困难
Solexa sequencing
• 高度自动化的系统 • 读取片段多,适合进行大量小片段的测序,如microRNA profiling • 基于可逆反应,随反应轮数增加,效率降低,信号衰减,读取序列较短,给de
emulsion
Adapter complement
Micro-reactors
Enrich
Anneal Seq primer
“Break micro-reactors” Isolate DNA containing beads
Perform emulsion PCR
Generation of millions of clonally amplified templates on each bead No cloning and colony picking
Brain
118 162 85 73 91 48 266 217 91 10
0 46 41 0 8 0 240 51 716 132
UHR
861 163 35
0 96 0 271 1538 0 10 113 799 12 42 346 59 81 0 0 69
Symbol Gene Description
每个探针进行检测的两个碱基后面有三个匹配碱基,因此一条测序引物读取的序列是不完整的
连接法测序 (二)
测序引物沿着Adapter移动5次,确保每个位点都被检测
连接法测序 (三)
0位置是Adapter的最后一个碱基,因此只检测一次, 该碱基是进行解码所必须的。
Advantage & disadvantage
1、每轮测序反应加入四种带有荧光标记的dNTP,末端带有可 以被去除的阻断基团 2、每轮反应只能整合一个核苷酸,仪器读取相应的荧光信号 3、信号读取结束,用化学方法去除阻断基团,进行下一轮测序 反应
Base calling from the raw data T G C TAC GAT …
Sequenced short reads (typically 25–50 bp) from ChIP-Seq experiments are first mapped onto the reference genome. The mapped reads are then used to estimate statistical parameters, which include the estimation of the average length F of sequenced DNA fragments.
Reversible Terminator Chemistry 可逆终止反应
• All 4 labelled nucleotides in 1 reaction
O HN
ON
PPP
O
cleavage site fluor
O
X
5’
HN
DNA O N
O
O
3’
block
Incorporation Detection Deblock; fluor removal
1
2
3
4
5
6
7
8
9
TTTTTTTGT…
The identity of each base of a cluster is read off from sequential images 根据每个点每轮反应读取的荧光信号序列,转换成相 应的DNA序列
Solexa 测序 Workflow
ABI SOLiD 连接法测序
RPL29 Ribosomal protein L29 PRDX2 Peroxiredoxin 2 PFN2 Profilin 2 STMN4 Stathmin-like 4 COX5B Cytochrome c oxidase subunit Vb GABRA1 Gamma-aminobutyric acid (GABA) A receptor, alpha 1
Expression difference between MPM and ADCA sample compare to a lung tissue control
Analysis of percentage of reads containing known coding region SNVs in the six tissue samples.
454 sequencing: Emulsion PCR (emPCR)
A
+ PCR Reagents
B
Adapter carrying library DNA
+ Emulsion Oil
Mix DNA Library & capture beads (limited dilution)
Create “Water-in-oil”
novo sequencing 拼接带来困难
SOLiD sequencing
• 每个碱基读取两次非常高的准确性,特别是对于SNP的检测 • 灵活的系统,完善的磁珠编码系统,可以进行样品的pooling,分割测序区域 • 读取长度受连接反应的轮数限制,给de novo sequencing 拼接带来困难
454 sequencing: Deposition of DNA beads into the PicoTiter™Plate
Load beads into PicoTiter™Plate
Load Enzyme Beads
Centrifuge Step
Illumina Solexa 合成测序
Sequence by Synthesize 基本原理
Sequence by Ligation 基本原理
文库制备:微珠单分子克隆
SOLiD 利用探针的连接反应读取模板的DNA序列
1024种8碱基探针 4色荧光,4种双核苷酸,每色荧光有256个探针(4^6)
连接法测序 (一)
测序引物与adapter退火
探针连接,检测荧光 切除荧光基团
第二轮探针连接,检测荧光 切除荧光基团
malignant pleural mesotheliomas (MPMs) :恶性胸膜间皮瘤 pulmonary adenocarcinoma (ADCA):肺腺癌
Transcriptome charaLeabharlann Baiduteristics
Solid line: at least one read Dashed line:at least 20 reads
Roche 454 焦磷酸测序 Pyrophosphate Sequencing
Illumina Solexa 合成测序 Sequence by Synthesize
ABI SOLiD 连接法测序 Sequence by Ligation
Roche 454 焦磷酸测序
Pyrophosphate Sequencing 基本原理
Methy-seq(1): 肿瘤和MCF7细胞系中 BRCA!启动子区域的甲基化差异
Methy-seq(2):
Some highlights:
Correlation between ChIP-Seq and his prior SAGE-like method (called GMAT) has r=0.906
Prepare DNA fragments
Clonal Single Molecule Arrays 单分子克隆 Attach single molecules to surface
Amplify to form clusters
Ligate adapters
20 microns
Sequence
~1000 molecules per ~ 1 µm cluster ~1000 clusters per 100 µm square ~40 million clusters per experiment
3’ OH free 3’ end
Next cycle
3’ 5’
A
C G
T
C
A
T
G A
T
G
C
T G C T A C G A T A C C C G A T C G A T
5’
Sequencing-by-Synthesis (SBS)
Cycle 1: Add sequencing reagents First base incorporated Remove unincorporated bases Detect signal
高通量测序的应用
• De novo 测序 • 基因深度测序(genome re-sequencing) • 转录组深度测序(transcriptome re-sequencing) • Digital expression profiling • ChIP-seq • Methy-seq
Transcriptome resequencing:
第五部分:其他类型的芯片
微缩芯片实验室(Lab-on-a-chip) 药物运输芯片 生理功能辅助芯片 纳米芯片
第六部分 高通量测序技术简介 Next Generation Sequencing
• Sample fragmentation • Library preparation • Sequencing reaction • Data analysis
SNV: Single Nucleotide Substitution Variant
Digital expression profiling(1): 人大脑组织与UHR(Universal Human Reference)的表达差异
Tag Sequence GATCAAACCAAGGCCCAGGC GATCACTGTTAATGATTTGC GATCAGTGTCTTTTCAGCAC GATCATCATGACCAATGAAA GATCATGCTGGCTGCAAAGA GATCCAAACCCAAGTCTTGA GATCCAAGATAAAGAAGGCA GATCCCAGACTGGTTCTTGA GATCCCCAATTGACTCAGAG GATCCGGGGCTGCAGGCTTG GATCCTACAGAAGTGGAGCT GATCCTAGTAATTGCCTAGA GATCCTGCGGGAGTCTCCCG GATCCTGTGAAGGCCTGGAA GATCGAGACACGTGATGGGA GATCGAGGACAGTGCAACCA GATCTCAATGCCAATCCTCC GATCTGCACGCCGCTGACCC GATCTGTGCCCAGAGATGGG GATCTGTGGAGAATGTACAC
UBB Ubiquitin B RING1 Ring finger protein 1 DIRAS2 DIRAS family, GTP-binding RAS-like 2 PHF20 PHD finger protein 20
IGJ Immunoglobulin J polypeptide FN1 Fibronectin 1 NFIX Nuclear factor I/X (CCAAT-binding transcription factor) TOP2A Topoisomerase (DNA) II alpha 170kDa KRT8 Keratin 8 ITGB7 Integrin, beta 7 BASP1 Brain abundant, membrane attached signal protein 1 DRD1IP Dopamine receptor D1 interacting protein GFAP Glial fibrillary acidic protein APLP2 Amyloid beta (A4) precursor-like protein 2
Digital expression profiling & microRNA re-sequencing:
hESC: human embryonic stem cells EB: embryoid bodies
ChIP-seq(1): 人一号染色体DNA-蛋白相互作用
ChIP-seq(2):
454 sequencing
• 读取长度大,400bp • 可以对未知基因组进行从头测序de novo sequencing • 当遇到polymer时,如AAAAAA等,荧光强度和碱基个数不成线性关系,判定
重复碱基个数有困难
Solexa sequencing
• 高度自动化的系统 • 读取片段多,适合进行大量小片段的测序,如microRNA profiling • 基于可逆反应,随反应轮数增加,效率降低,信号衰减,读取序列较短,给de
emulsion
Adapter complement
Micro-reactors
Enrich
Anneal Seq primer
“Break micro-reactors” Isolate DNA containing beads
Perform emulsion PCR
Generation of millions of clonally amplified templates on each bead No cloning and colony picking
Brain
118 162 85 73 91 48 266 217 91 10
0 46 41 0 8 0 240 51 716 132
UHR
861 163 35
0 96 0 271 1538 0 10 113 799 12 42 346 59 81 0 0 69
Symbol Gene Description
每个探针进行检测的两个碱基后面有三个匹配碱基,因此一条测序引物读取的序列是不完整的
连接法测序 (二)
测序引物沿着Adapter移动5次,确保每个位点都被检测
连接法测序 (三)
0位置是Adapter的最后一个碱基,因此只检测一次, 该碱基是进行解码所必须的。
Advantage & disadvantage