生物信息学前沿技术动态
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Related publications
• SOAP:
Ruiqiang Li, Yingrui Li, Karsten Kristiansen, Jun Wang. SOAP: short oligonucleotide alignment program. Bioinformatics. 2008 24: 713-714
Outline
Introduction Sequencing technologies Bioinformatic analysis Applications
Tree of Life (de novo) Population evolution and Breeding (Resequencing) Disease (Resequencing) Epigenomics Transcriptomics Metagenomics Proteomics
- An improved version
(2) SOAP2
Improvements:
Use Burrows Wheeler Transformation (BWT) compressed index instead of the seed algorithm No read length limitation Allow more mismatches and longer gaps for long reads Support various input and output file formats
• SOAPsnp:
Ruiqiang Li, Yingrui Li, Xiaodong Fang, Huanming Yang, Jian Wang, Karsten Kristiansen, Jun Wang. SNP detection for massively parallel whole genome resequencing. Genome Research. 2009
Tree of Life (de novo) Population evolution and Breeding (Resequencing) Disease (Resequencing) Epigenomics Transcriptomics Metagenomics Proteomics
Algorithm:
Sequencing reads
SOAP Map reads onto reference genome
Use Bayes’ theorem to infer the genotype given the observed allele types and quality scores on each chromosomal site.
Faster with less RAM usage
Half memory usage than SOAP
43 and 30 times faster for single-end and paired-end reads,
respectively
(3) SOAPsnp
- SNP detection for short reads re-sequencing
基因组学、生物信息学前沿技术动态
Outline
Introduction Sequencing technologies Bioinformatic analysis Applications
Tree of Life (de novo) Population evolution and Breeding (Resequencing) Disease (Resequencing) Epigenomics Transcriptomics Metagenomics Proteomics
• SOAPindel, SOAPsv:
is coming …
• SOAPdenovo:
Ruiqiang Li, Hongmei Zhu, Jue Ruan, et al. De novo assembly of the human genomes with massively parallel short read sequencing. Genome Research. 2009
Input: FASTA, FASTQ, gzipped Output: SOAP tab-delimited table, SAM (sequence alignment/map), binary equivalent (BAM), consed
SOAP2: Benchmark on human data
Outline
Introduction Sequencing technologies Bioinformatic analysis Applications
Tree of Life (de novo) Population evolution and Breeding (Resequencing) Disease (Resequencing) Epigenome Transcriptome Metagenomics Proteomics
十亿碱基(Gb)
10,000
20,000
30,000
40,000
50,000
60,000
0
1982 1984 1986 1988 1990 1992 1994
年份
Growth of NCBI
Growth of NCBI
1996 1998 2000 2002 2004 2006 2008 2010
The fast-revolution of DNA sequencing technology
Plan to commercialize in this year • PacBio, (Real-time single molecular sequencing, long reads 1-10kb) • visiGene, AB (Real-time single molecular sequencing) • Ion Torrent (Semiconductor chip, measuring pH changing, quite low price)
Paired-end reads alignment
Align a pair of reads simultaneously
A pair will be aligned when two reads are mapped with the right orientation relationship and proper distance
Published Bioinformatics Tools
SOAP
- Short Oligonucleotide Alignment Program
• BGI developed software package • Website:
http://soap.genomics.org.cn
• >10,000 users
BGI is constructing supercomputing platform to match the requirements
Solution for research labs: Cloud computing
From genotype to Phenotype!
mRNA
Protein
Genome
Genotype
Epigenetics
Metabolome
Phenotype
ncRNA
Genotype
Intermediate Phenotype
Molecular Phenotype
Outline
Introduction Sequencing technologies Bioinformatics analysis Applications
The fast-revolution of DNA sequencing technology
List of available tehnologies: • 3730, AB (Sanger method) • 454, Roche (Sequencing-by-Synthesis) • Genome Analyzer, HiSeq 2000, illumina (Sequencing-by-Synthesis) • Solid, AB (Sequencing-by-ligation) • Helicos (the first Single molecular sequencing)
Output unpaired hits for structural variation (SV) detection
Benchmark
10M single-end Illumina/Solexa reads with length 32bp against a 5Mb human genome region. (refer to SOAP paper for details)
Prior probability of each genotype
Recalibrate sequencing quality score
Calculate likelihood of each genotype Inferred genotype via Bayes’ theorem
(4) SOAPdenovo
(1) SOAP
Single-end reads alignment
25~60 bp read length Ungapped and gapped alignment Allow at most two mismatches in default One continuous gap with a size ranging from 1 to 3bp is accepted Ungapped hits have precedence over gapped hits Since 3’-end of read exhibit a much higher number of sequencing errors, SOAP can iteratively trim lowquality read end and redo alignment until hits are detected or remaining sequence is too short For multiple equal-best hits, user can instruct the program to report none, random one, or all of them
• SOAP2:
Ruiqiang Li, Chang Yu, Yingrui Li, Tak-Wah Lam, Siu-Ming Yiu, Karsten Kristiansen, Jun Wang. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 2009
Baidu Nhomakorabea Outline
Introduction Sequencing technologies Bioinformatic analysis Applications
Tree of Life (de novo) Population evolution and Breeding (Resequencing) Disease (Resequencing) Epigenomics Transcriptomics Metagenomics Proteomics
Start-developing technology (Nanopore, <$100/genome) • Several companies, include illumina, IBM, etc.
a. Sanger sequencing method
b. nextgeneration sequencing method