基因组数据分析
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
*Linux, 64bit CPU, 4G-256G memory
5.3 Solexa 数据
5.3 Leabharlann Baiduolexa 数据
• *.contig
Contigs file
• *.scafSeq
Scaffolds file
5.4 Solid 数据
• Reads correction – SOLiD Accuracy Enhancement Tool (SAET) http://solidsoftwaretools.com/gf/project/saet/
Small InDel SNP annotation SNP annotation Genome assembly Gene expression Annotation and target prediction
小 RNA 测序
4.1 常规分析流程
• Reads correction • Assembly
• runAssembly -o outputdir (-large) 1.sff • Result files
–
–
–
–
454AllContigs.fna 454LargeContigs.fna 454ReadStatus.txt (Assembled/Singleton/Repeat) 454Contigs.ace
–
bwa sampe ref.fa aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln.sam
4.3 Solexa 数据 : SAM 格式
http://genome.sph.umich.edu/wiki/SAM
4.3 Solexa 数据 : SOAP2
4.3 Solexa 数据 : SOAP2
4.4 Solid 数据 : BioScope
4.4 Solid 数据
4.4 Solid 数据
4.5 454 数据 : newbler
• RunMapping -o outputdir ref.fa 1.sff … • 454ReadStatus.txt
4.6 SNP/INDEL Calling
• Samtools
- http://samtools.sourceforge.net/
- $ samtools mpileup -uf ref.fa aln1.bam aln2.bam | bcftools view -bvcg - > var.raw.bcf - $ bcftools view var.raw.bcf | vcfutils.pl varFilter – D100 > var.flt.vcf - The VCF format (Variant Call Format):
–
is: bwtsw:
< 2Gb > 2Gb
–
• Mapping – bwa aln ref.fa short_read.fq > aln_sa.sai
• Output alignments in the SAM format – bwa samse ref.fa aln_sa.sai short_read.fq > aln.sam
• Index reference sequences – 2bwt-builder ref.fa
• Mapping – single
soap -a <reads.fq> -D <ref.fa.index> -o <output>
–
pair end soap -a <reads1.fq> -b <reads2.fq> -D <ref.fa.index> -o <PE_output> -2 <SE_output> -m <min_insert_size> -x <max_insert_size>
• Assembly – 1. SOLiD de novo Accessory Tools http://solidsoftwaretools.com/gf/project/denovo/
2. Velvet http://www.ebi.ac.uk/~zerbino/velvet/
5.5 454 数据
小 RNA 测序
2 第二代测序分析工具
• 超过 1000 种分析工具
–
http://seqanswers.com/wiki/Software/list
• 常规分析 – calling, quality control, alignment/assembly, SNP/Indel discovery, SNP annotation • 高级分析 – functional polymorphism, disease/phenotype, genomic coordinate
• Scaffolding • Fix gap • Gene and Genomics annotation
5.1 常规分析流程
5.1 常规分析流程
5.2 de novo 分析工具
5.3 Solexa 数据
• Correction tool for SOAPdenovo
http://soap.genomics.org.cn/
第二代测序中的数据分析 ( 基因组 )
1 第二代测序分析类型
SNP
全基因组 / 外显子组测序
基因组 目标区域深度测序 De novo 测序 mRNA 测序 转录组
Small InDel SNP annotation SNP annotation Genome assembly Gene expression Annotation and target prediction
–
–
–
short reads: Solexa long reads: 3730, 454 reads hybrid reads: short + long reads
• SNP/INDEL Calling
4.2 常规分析工具
4.3 Solexa 数据
• BWA
http://bio-bwa.sourceforge.net/
5.6 Gene and Genome Annotation
• De novo prediction
–
–
GeneScan Augustus
• Homology-based prediction
• Reference gene set
谢谢 !
2 第二代测序分析工具
3 第二代测序平台数据
• illumina Hiseq2500 (solexa)
–
–
读长: 250nt 格式: fastq 读长: 50nt 格式: csfasta
• ABI SOLiD
–
–
• Roche GS FLX (454)
–
–
读长: 800~1000nt 格式: sff/fasta
3.1 Solexa – fastq 格式
3.1 Solexa – fastq 格式
http://en.wikipedia.org/wiki/FASTQ_format
3.2 Solid – csfasta 格式
3.3 fasta 格式
4 基因组常规分析
SNP
全基因组 / 外显子组测序
基因组 目标区域深度测序 De novo 测序 mRNA 测序 转录组
• SAMtools
http://samtools.sourceforge.net/
• SOAP2
http://soap.genomics.org.cn/
• SOAPsnp
http://soap.genomics.org.cn/soapsnp.html
4.3 Solexa 数据 : BWA
• Index reference sequences – bwa index -a is/bwtsw ref.fa
• Soapdenovo
http://soap.genomics.org.cn/soapdenovo.html
• Velvet
http://www.ebi.ac.uk/~zerbino/velvet/
• ABySS
http://www.bcgsc.ca/platform/bioinfo/software/abyss
4.6 SNP/INDEL Calling
• GATK: Genome Analysis Toolkit
– http://www.broadinstitute.org/gatk/
5 de novo 常规分析
5.1 常规分析流程
• Reads correction • Assembly
–
–
–
short reads: Solexa long reads: 3730, 454 reads hybrid reads: short + long reads
5.3 Solexa 数据
5.3 Leabharlann Baiduolexa 数据
• *.contig
Contigs file
• *.scafSeq
Scaffolds file
5.4 Solid 数据
• Reads correction – SOLiD Accuracy Enhancement Tool (SAET) http://solidsoftwaretools.com/gf/project/saet/
Small InDel SNP annotation SNP annotation Genome assembly Gene expression Annotation and target prediction
小 RNA 测序
4.1 常规分析流程
• Reads correction • Assembly
• runAssembly -o outputdir (-large) 1.sff • Result files
–
–
–
–
454AllContigs.fna 454LargeContigs.fna 454ReadStatus.txt (Assembled/Singleton/Repeat) 454Contigs.ace
–
bwa sampe ref.fa aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln.sam
4.3 Solexa 数据 : SAM 格式
http://genome.sph.umich.edu/wiki/SAM
4.3 Solexa 数据 : SOAP2
4.3 Solexa 数据 : SOAP2
4.4 Solid 数据 : BioScope
4.4 Solid 数据
4.4 Solid 数据
4.5 454 数据 : newbler
• RunMapping -o outputdir ref.fa 1.sff … • 454ReadStatus.txt
4.6 SNP/INDEL Calling
• Samtools
- http://samtools.sourceforge.net/
- $ samtools mpileup -uf ref.fa aln1.bam aln2.bam | bcftools view -bvcg - > var.raw.bcf - $ bcftools view var.raw.bcf | vcfutils.pl varFilter – D100 > var.flt.vcf - The VCF format (Variant Call Format):
–
is: bwtsw:
< 2Gb > 2Gb
–
• Mapping – bwa aln ref.fa short_read.fq > aln_sa.sai
• Output alignments in the SAM format – bwa samse ref.fa aln_sa.sai short_read.fq > aln.sam
• Index reference sequences – 2bwt-builder ref.fa
• Mapping – single
soap -a <reads.fq> -D <ref.fa.index> -o <output>
–
pair end soap -a <reads1.fq> -b <reads2.fq> -D <ref.fa.index> -o <PE_output> -2 <SE_output> -m <min_insert_size> -x <max_insert_size>
• Assembly – 1. SOLiD de novo Accessory Tools http://solidsoftwaretools.com/gf/project/denovo/
2. Velvet http://www.ebi.ac.uk/~zerbino/velvet/
5.5 454 数据
小 RNA 测序
2 第二代测序分析工具
• 超过 1000 种分析工具
–
http://seqanswers.com/wiki/Software/list
• 常规分析 – calling, quality control, alignment/assembly, SNP/Indel discovery, SNP annotation • 高级分析 – functional polymorphism, disease/phenotype, genomic coordinate
• Scaffolding • Fix gap • Gene and Genomics annotation
5.1 常规分析流程
5.1 常规分析流程
5.2 de novo 分析工具
5.3 Solexa 数据
• Correction tool for SOAPdenovo
http://soap.genomics.org.cn/
第二代测序中的数据分析 ( 基因组 )
1 第二代测序分析类型
SNP
全基因组 / 外显子组测序
基因组 目标区域深度测序 De novo 测序 mRNA 测序 转录组
Small InDel SNP annotation SNP annotation Genome assembly Gene expression Annotation and target prediction
–
–
–
short reads: Solexa long reads: 3730, 454 reads hybrid reads: short + long reads
• SNP/INDEL Calling
4.2 常规分析工具
4.3 Solexa 数据
• BWA
http://bio-bwa.sourceforge.net/
5.6 Gene and Genome Annotation
• De novo prediction
–
–
GeneScan Augustus
• Homology-based prediction
• Reference gene set
谢谢 !
2 第二代测序分析工具
3 第二代测序平台数据
• illumina Hiseq2500 (solexa)
–
–
读长: 250nt 格式: fastq 读长: 50nt 格式: csfasta
• ABI SOLiD
–
–
• Roche GS FLX (454)
–
–
读长: 800~1000nt 格式: sff/fasta
3.1 Solexa – fastq 格式
3.1 Solexa – fastq 格式
http://en.wikipedia.org/wiki/FASTQ_format
3.2 Solid – csfasta 格式
3.3 fasta 格式
4 基因组常规分析
SNP
全基因组 / 外显子组测序
基因组 目标区域深度测序 De novo 测序 mRNA 测序 转录组
• SAMtools
http://samtools.sourceforge.net/
• SOAP2
http://soap.genomics.org.cn/
• SOAPsnp
http://soap.genomics.org.cn/soapsnp.html
4.3 Solexa 数据 : BWA
• Index reference sequences – bwa index -a is/bwtsw ref.fa
• Soapdenovo
http://soap.genomics.org.cn/soapdenovo.html
• Velvet
http://www.ebi.ac.uk/~zerbino/velvet/
• ABySS
http://www.bcgsc.ca/platform/bioinfo/software/abyss
4.6 SNP/INDEL Calling
• GATK: Genome Analysis Toolkit
– http://www.broadinstitute.org/gatk/
5 de novo 常规分析
5.1 常规分析流程
• Reads correction • Assembly
–
–
–
short reads: Solexa long reads: 3730, 454 reads hybrid reads: short + long reads