二代测序实验与测序原理
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Sequencing primer
Primer A
Key
MID
Library fragment
Primer B
20
454、SOLEXA测序模式
454 solexa Single Single或什么都不说 Pair end
Pair end
Mate pair
Semiconductor sequencing
附:Solexa 1.3以前的quality计算公式是:
Q值对应ASCII码
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS......................................... ..........................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.......... ...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII.......... .................................JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ.......... LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL........................................ !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr | | | | | 33 59 64 73 104 0........................26...31.......40 -5....0........9.............................40 0........9.............................40 3.....9.............................40 0........................26...31........41 S X I J Sanger Phred+33, raw reads typically (0, 40) Solexa Solexa+64, raw reads typically (-5, 40) Illumina 1.3+ Phred+64, raw reads typically (0, 40) Illumina 1.5+ Phred+64, raw reads typically (3, 40) with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator (bold) (Note: See discussion above). L - Illumina 1.8+ Phred+33, raw reads typically (0, 41) -
BASE CALLING
FLOWCELL LAYOUT ON GAII
A flow cell contains 8 lanes Lane 1 Lane 2 .
.
Lane 8 Each lane contains 2 columns
Column 1 Column 2
.
Tile
Each column contains 60 tiles Each tile is imaged 4 times per cycle
2nd cycle denaturation
n=25
diol
diol
diol
diol
diol
diol
2nd cycle extension
2nd cycle annealing
TEMPLATE PREPARATION-BRIDGE RCR
Adaptor ligation
Surface attachment
Index
PAIR-END 测序优势
Known Distance
Read 1
Read 2
Repetitive DNA
Paired read maps uniquely
Single read maps to multiple positions
MATE-PAIR 建库和测序
Known Distance
Initial extension
Denaturation
Cluster Generation, Bridge PCR
diol
diol
diol
diol
diol
diol
diol
diol
1st cycle denaturation
1st cycle annealing
1st cycle extension
@HWI-ST507:211:C18E6ACXX:2:1101:1688:1992 1:N:0:GAGTGG CGACAATTTTTTTTGATATTAATAAAGATAGAACTTTCTTCCTATG AGTTTTCTCTC + CCCFFDFFHHHHGJJGHIIJGIIJJJJIIJJHJJJJJIJJIIIGIIIJGGIHJ DIJIGAHEHFFGHGHE
SEQUENCE MULTIPLE SAMPLES IN THE
SAME LANES
Read 1
Multiplexing – multiple samples in the same lanes
Index Read Read 2
Rd1 SP Rd2 SP DNA insert Index SP DNA insert
Bridge amplification
Denaturation Trends in Genet
SEQUENCING BY SYNTHESIS OVERVIEW
3’ 5’
Cycle 1:
A C T G C A G A T G C T
Add sequencing reagents First base incorporated
二代测序的建库与测序原理
何有裕 yyhe@sibs.ac.cn yyhe@biosino.com.cn
上海生物信息技术研究中心
上海众信生物技术有限公司 苏州众信生物技术有限公司
内容
样本处理与测序原理简介
罗氏454 Illumina solexa
原始数据质量控制
TRUSEQ RNA AND DNA SAMPLE PREPARATION
Trends in Genet 24:133(2008)
PYROSEQUENCING
Single dNTP type flows per cycle Inorganic pyrophosphate (PPi) drives visible light through a series of reactions Remove unincorporated nucleotide
CLUSTER GENERATION OVERVIEW
~ 1000-6000 molecules per cluster
Cluster Generation, Template Hybridization
OH
OH
diol
diol
diol
diol
diol
diol
diol
P7
P5
flowcell
Template hybridization
ILLUMINA SEQUENCE IDENTIFIERS
Casava 1.8的序列标识 @HWI-ST507:211:C18E6ACXX:2:1101:1688:1992 1:N:0:GAGTGG
HWI-ST507 211 C18E6ACXX 2 1101 1688 1992 仪器唯一名称 Run ID Flowcell ID Flowcell Lane 在Flowcell Lane中Tile编号 在Tile中簇的x坐标 在Tile中簇的y坐标
ILLUMINA SEQUENCE IDENTIFIERS
Casava 1.8以前的序列标识 @HWI-EAS364_0004:4:1:995:9044#0/1 HWIEAS364_0004 仪器唯一名称
4
1 995 9044 #0 /1
Flowcell Lane
在Flowcell Lane中Tile编号 在Tile中簇的x坐标 在Tile中簇的y坐标 混合样本中的index编号(0代表没有 index) Pair配对的成员
Read 1
Read 2
Molecular Ecology Resources (2011)
TEMPLATE PREPARATION- EMULSION PCR
Fragmentation
Ligation
Water-in-oil emulsion
Mirco-reactor
emPCR
PicoTiter Plate loading
Trends in Genet 24:133(2008)
BASE CALLING
•
Homopolymer error
GV6330
灵活的多样本标签技术
MID A B C D E F G H I J K L
Sequence ACGAGTGCGT ACGCTCGACA AGACGCACTC AGCACTGTAG ATCAGACACG CGTGTCTCTA CTCGCGTGTC TAGTATCAGC TCTCTATGCG TGATACGTCT TACTGAGCTA ATATCGCGAG
Intensity file
Cluster Generation, Sequencing Primer Hybridization(Single测序方式处理步骤)
OH
OH
diol
OH
diol
OH
Linearization
Blocking with ddNTP ()
Denature and Hybridization SBS3
T G C T A C G A T A C C C G A
Detect Signal
Cleave Terminator and Dye
Cycle 2-n: Add sequencing reagents and repeat
T C
G A T
5’
CYCLIC REVERSIBLE TERMINATION
PRIMARY DATA ANALYSIS BY FIRECREST AND BUSTARD IN RTA/OLB
Firecrest tiff image file
Position
X,Baidu NhomakorabeaY
A
X, Y
Sequence
Bustard
Cycle 1
C G T A
Cycle 2
C G T
Sequence file
Detect H+ released as a voltage change—fast Common microchip design standards—lowcost manufacturing Sequencing volume is increasing
FASTA序列格式
Fastq 文件用4行记录一条序列 第一行以@字符开头,跟在后面的是序列标识和描述 第二行是序列字符 第三行以+字符开头,后面可以为空,或者和第一行一样 第四行是第二行序列质量数据的编码,长度需和第二行一样 Example:
Fluorophore cleavage
Terminating group Trends in Genet 24:133(2008) Nat Rev Genet 11:31(2010)
• • •
All four labeled reversible terminators are added per cycle Remove unincorporated bases and detect signal Remove the terminating group and the fluorescent dye
1
N 0 GAGTGG
Pair配对的成员(1 或者 2)
Read是未通过过滤(Y:read是坏的, N:read是好的) Control bits,0表示control bits没有设置 Index序列
序列质量
Quality计算: Q是用phred quality score的计算方式计算得到:
p是对应的碱基call错的概率 计算得到的Q值是一个整数, 将这个Q值加上33或者64后再转换成ASCII字符