EST或转录组分析

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
including its non-coding regions; • βN,which estimates the diversity in non-synonymous sites; • βS, which estimates the diversity in synonymous sites.
已经过剪接、去除了内含子的cDNA
文库类型
非标准化cDNA文库 均一化 cDNA 文库 差减cDNA文库 抑制性差减cDNA文库
文库测序
单向测序 双向测序
俞鸿 yuhong19790308@hotmail.com
ESTs测序
传统Sanger测序方法
3730
二代测序
Roche / 454 Genome Sequencer FLX
俞鸿 yuhong19790308@hotmail.com
数据库相似性检索
俞鸿 yuhong19790308@hotmail.com
EST翻译与ORF预测
俞鸿 yuhong19790308@hotmail.com
功能注释与功能分类
序列相似性比对
BLAST,BLAT NR,UniRef100,Genome sequences,etc.
Read summary statistics
5 Libraries
Normalized libraries 1-2 Native libraries 3-5
俞鸿 yuhong19790308@hotmail.com
Contig summary
PCAP, not Newbler assembler
序列前处理
载体序列屏蔽
无冗余载体序列库
UniVec EMVEC
工具
BLAST Cross_Match
低复杂性区域查找与屏蔽
DUST
重复因素
类型
LINEs(Long interspersed elements) SINEs(Short interspersed elements) LTRs(Long terminal repeat) SSRs(Short simple repeats)
俞鸿 yuhong19790308@hotmail.com
ESTs的用途
基因查找 补充基因组 表达量比较分析 辅助基因结构的鉴定 交替剪接的分析 SNP分析 蛋白质组学质谱搜库
俞鸿 yuhong19790308@hotmail.com
ESTs分析步骤
cDNA文库构建 文库测序 序列前处理 聚类与拼接 数据库匹配 功能注释 其它分析
Transcript assembly and quantification
Different expression test
Pathway mapping
俞鸿 yuhong19790308@hotmail.com
RNA-seq数据分析
29 俞鸿 yuhong19790308@hotmail.com
Illumina
俞鸿 yuhong19790308@hotmail.com
传统测序转录组测序结果分析
8 俞鸿 yuhong19790308@hotmail.com
ESTs数据质量
Phred scores
q=20, 99% base calling accuracy
俞鸿 yuhong19790308@hotmail.com
Gene Ontologies
BLAST2GO
俞鸿 yuhong19790308@hotmail.com
SNP detection
877 candidate SNPs
~1SNP/460bp one in every 192 bp in Eucalypt
Indel-type errors Classification statistics
Domain与motif查找
Interproscan, pfam
GO功能分类与富集分析
BLAST2GO, etc.
俞鸿 yuhong19790308@hotmail.com
EST ANALYSIS PIPELINES
俞鸿 yuhong19790308@hotmail.com
SNP分析
基本统计信息
工具
RepeatMasker MaskerAid
去除污染序列
BLAST
Library Lib 1 Lib 2 Lib 3 Lib 4 Lib 5 Lib 6 Lib 7 Lib 8 Lib 9 Mean
STDEV STDEV/Mean
rRNA 0.25% 0.66% 1.99% 0.09% 0.64% 0.40% 0.20% 0.18% 0.35% 0.53% 0.58%
EST/cDNA文库数据分析
开放共赢 关注创新
俞鸿
副总经理
手机:15900766827 E-mail: hyu@biorefer.com
12628609@qq.com
1
什么是ESTs?
ESTs(Expressed Sequence tags )是从cDNA文件中随机挑 选单次测序的短序列,提供了全基因பைடு நூலகம்测序的廉价替代方案 。
every 5.2 reads (on average) resulted in a different significant BLAST hit.
俞鸿 yuhong19790308@hotmail.com
Workflow
Data format conversion
Map reads onto the genome (8-10h/sample)
?
Nat Biotechnol. 2009,27(5):455 Bioinformatics 2009,25 (9):1105
俞鸿 yuhong19790308@hotmail.com
Measure expression abundance
FPKM/RPKM
Fragments/reads Per Kilobase of exon per Million
含有同一基因不同的转录形 式,如各种选择性剪接体
每一类中可能包含旁系同源 基因(paralogous expressed gene)的转录本
序列的保真度低
Unigene的聚类方法位于两者之间
stackPACK
聚类与拼接软件
俞鸿 yuhong19790308@hotmail.com
结果统计
Assembies/contigs and singletons number Total length Length distribution Contig depth statistics
俞鸿 yuhong19790308@hotmail.com
Xenobiotics
Best BLAST hit an e-value ≤ 1 × 10-03 and a bit score > 40 was considered a significant match
俞鸿 yuhong19790308@hotmail.com
SNP number SNP出现频率
Non-synonymous and synonymous 其他统计信息
non-
俞鸿 yuhong19790308@hotmail.com
Nucleotide diversity analysis
S is the number of SNPs detected in the contig, L is the contig sequence length and D is the sequencing depth β is useful as a relative measurement to compare the nucleotide diversity between contigs generated within this project. Coding sequence measuring more than 200 bp and an average sequencing depth of at least 10 reads/nt. Three β parameters were calculated for each contig: • βT, which estimates the diversity on the entire contigs,
18 SNP were unique to one sex
Males 16, females 2
Ts/Tv ratio
Ts: transitions Tv: transversions
and one in 214 bp in maize
俞鸿 yuhong19790308@hotmail.com
Rarefaction and normalization
Mitoc hondria mRNA 4.90% 0.78% 0.18% 0.31% 0.65% 0.22% 0.30% 0.31% 0.31% 0.88% 1.52%
G3PD 0.56% 0.71% 0.50% 0.78% 0.76% 0.44% 0.55% 0.92% 0.78% 0.67% 0.16%
0.24
Ac tin 0.29% 0.20% 0.36% 0.76% 0.50% 0.66% 0.59% 0.62% 0.17% 0.46% 0.21%
0.46
Tubulin 0.09% 0.20% 0.19% 0.83% 1.10% 1.04% 1.31% 2.25% 0.20% 0.80% 0.72%
fragments mapped
FPKM t
=
Xt Lt M
×109
俞鸿 yuhong19790308@hotmail.com
Map reads onto genomes (Bowtie)
Nat Biotechnol. 2009,27(5):455
Genome Biology 2009, 10:R25
俞鸿 yuhong19790308@hotmail.com
Map reads onto junctions (Tophat)
表达谱分析 交替剪接分析 SSR分析
ESTs数据获取
资源库
cDNA文库测序 – 454,illumina HiSeq 2000, SoLid,3730, …
俞鸿 yuhong19790308@hotmail.com
cDNA文库构建
cDNA文库
是指某生物某发育时期所转录的全部 mRNA 经反转录形成的 cDNA 片段与某种载体连接而形成的克隆的集合。 具有组织细胞特异性 比基因组文库小的多
0.89
MADS
0.06% 0.00% 0.06% 0.34% 0.00% 0.13% 0.10% 0.40% 0.10% 0.13% 0.14%
1.08
Gene identification and expression analysis of 86,136 Expressed Sequence Tags (EST) from the rice genome PMID: 15626331
俞鸿 yuhong19790308@hotmail.com
聚类与拼接
严格的聚类方法(
松散的聚类方法(loose)
Stringent)
产生的一致性序列比较长
产生的一致性序列比较短
表达基因ESTs数据的覆盖率 低
因此所含有的同一基因的不 同转录形式少
序列保真度高 TIGR Gene Indices
表达基因ESTs数据的覆盖率 高
俞鸿 yuhong19790308@hotmail.com
SNP分析软件
The GS Reference Mapper(454 Life Science) Pyrobayrs
俞鸿 yuhong19790308@hotmail.com
应用实R例oche 454转录组数据分析
Next-generation pyrosequencing of gonad transcriptomes in the polyploid lake sturgeon (Acipenser fulvescens): the relative merits of normalization and rarefaction in gene discovery. Hale MC, McCormick CR, Jackson JR, Dewoody JA. BMC Genomics. 2009 Apr 29;10:203. PMID: 19402907 [PubMed - indexed for MEDLINE]
相关文档
最新文档