新一代测序技术
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Read length 2 x 100/150 # of single reads Instrument price Run price 3/0.6B
$690k/$740k $590k/$640k ~$300k ~$23k ~$11k ~$17k
Life Technologies – Ion Torrent
comparative
http://bowtie-bio.sourceforge.net/tutorial.shtml
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
PGM 314 # of sensors 1.2M PGM 316 6.1M up to 1Gb 3-5 hrs up to 400b up to 3M $299 $250 $50k PGM 318 11M up to 2Gb 4-7 hrs up to 400b up to 5.5M $499 $250 $50k 165M ~10Gb 2-4 hrs up to 200b up to 82M $699 $300 $149k PI PII (est. mid2013) 660M ~32Gb (at launch) 2-4 hrs 100b up to 330M ~$699 ~$300 $149k PIII (est mid2014) 1.2B ~64Gb (at launch) 2-4 hrs 100b up to 660M ? ? $149k
有参考基因组的数据分析
Single-end Bowtie Paired-end
data preparing
short reads mapping mutation detection Reference Genome
m
cuflinks
digital gene expression profiling methylation
illumina HiSeq 2000
454 (Roche)
Total output/run Run time Output/day Read length # of single reads Instrument price Run price
700 Mb 23 hrs 700 Mb up to 1kb 1M ~$500k ~$6k
transcript
Exon A
Exon C Exon D
Reads’ mappings at chromosome and transcript level
23
FPKM (RPKM): Expression Values
Fragments (Reads) Per Kilobase of exon model per Million mapped fragments
Examples of RNA-seq visualization
25
Examples of RNA-seq visualization
130.71
Normalized coverage
Coverage plot for gene ERBB2 in breast cancer
0.00
4.41
Normalized coverage
35 Mb 10 hrs 35 Mb ~400b 0.1M $125k ~$1k
Life Technologies – SOLiD
SOLiD 5500xl Total output/run Run time Output/day Read length # of single reads Instrument price Run price 95 Gb 6 days 16 Gb 2 x 60 800M $595k ~$10k SOLiD 5500xl Wildfire 240 Gb 10 days 24 Gb 2 x 50 2.4B $70k upgrade ~$5k SOLiD 5500 48 Gb 6 days 8 Gb 2 x 60 400M $349k ~$5k SOLiD 5500 Wildfire 120 Gb 10 days 12 Gb 2 x 50 1.2B $70k upgrade ~$2.5k
Coverage plot for gene ERBB2 in normal breast
0.00
26
study of gene structure
alternative splicing
fusion gene detecting
How to map billions of short reads onto genomes
bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r>} -S [<hit>]
-x <bt2-idx> The basename of the index for the reference genome. The basename is the name of any of the index files up to but not including the final .1.bt2 / .rev.1.bt2 / etc. bowtie2 looks for the specified index first in the current directory, then in the directory specified in the BOWTIE2_INDEXES environment variable. -1 <m1> Comma-separated list of files containing mate 1s (filename usually includes _1), e.g. -1 flyA_1.fq,flyB_1.fq. Sequences specified with this option must correspond file-for-file and read-for-read with those specified in <m2>. Reads may be a mix of different lengths. If - is specified, bowtie2 will read the mate 1s from the "standard in" or "stdin" filehandle. Comma-separated list of files containing mate 2s (filename usually includes _2), e.g. -2 flyA_2.fq,flyB_2.fq. Sequences specified with this option must correspond file-for-file and read-for-read with those specified in <m1>. Reads may be a mix of different lengths. If - is specified, bowtie2 will read the mate 2s from the "standard in" or "stdin" filehandle. Comma-separated list of files containing unpaired reads to be aligned, e.g.lane1.fq,lane2.fq,lane3.fq,lane4.fq. Reads may be a mix of different lengths. If - is specified,bowtie2 gets the reads from the "standard in" or "stdin" filehandle. File to write SAM alignments to. By default, alignments are written to the "standard out" or "stdout" filehandle (i.e. the console).
Comparison
Spaced seeds • Requires ~50Gb of memory. • Runs 30-fold slower. MAQ Burrows-Wheeler • Requires <2Gb of memory. • Runs 30-fold faster. Bowtie
Billions of Reads, Mass Data
SRA: Sequence Read Archive
SRP: project SRX: experiment SRS: sample SRR: run
转录组测序 (RNA-seq)数据处理
Reads in RNA-seq
Exon A ? Exon B Exon C ? Exon D chromosome ? ? ? ? Exon B ? ? ? ?
Illumina
HiSeq 2000/2500 Total output 600/120 Gb Run time Output/day 11 days/27 hrs 55 Gb HiSeq 1000/1500 300/60 Gb 8.5 days/27 hrs 35 Gb 2 x 100/150 1.5/0.3B GAIIx 95 Gb 14 days ~7 Gb 2 x 150 320M HiScan SQ 150 Gb 8.5 days 18 Gb 2 x 100 750M ~$400k ~$11k MiSeq ~8 Gb ~24 hrs ~8 Gb 2 x 250 15M $125k ~$1k
• screening
– low light density = less samples
Bridge PCR & Sequencing
SoLiD System
Single Molecule sequencing
Single Molecule sequencing
454 GS FLX+ System
C FPKM = 10 × NL
9
C= the number of reads mapped onto the gene's exons N= total number of reads in the experiment L= the sum of the exons in base pairs.
m m m
TFBE: Transcription Factor Binding Events
CisGenome
isoform identification & isoform profiling
TopHat
non-coding
gene fusion
other applications
de novo
Run time
Total output ~230Mb Mean read ~4300bp length # of reads ~55k Instrument price Run price ~$700k ~$400
~$700k ($20k ~$700k ($20k for upgrade) for upgrade) ~$400 ~$400
Total output up to 100Mb Run time Avg read length # of single reads Chip price Reagent price** Instrument price 2-4 hrs up to 400b up to 0.6M $99 $250 $50k
新一代测序技术
zhangshaojun@ems.hrbmu.edu.cn
新一代测序技术流程
• sequencing
– sequencing by synthesis – single molecule sequence (tSMS/SMRT)
Leabharlann Baidu
• amplifying
– bridge PCR
Pacific Biosciences
Optimized For: RS: Throughput (XL/C2) 2X55 min RS: Read Length (XL/C2) 1X120 min ~100Mb ~4500bp ~25k ~$700k ~$400 RS: Ultra Read Length (XL/XL)* 1X120 min ~100Mb ~5000bp ~25k ~$700k ~$400 RS II (XL/C2) 1x120 min ~230Mb ~4500bp ~50k RS II (XL/XL)* 1x120 min ~230Mb ~5000bp ~50k