转录组学分析流程及常用软件介绍.pdf

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
1. 图形化简
2.解图,确定转录本序列
Butterfly

2016-1-11
Butterfly
2016-1-11
Trinity
• --jaccard_clip
Trinity参数
– for gene-dense compact genome, such as fungal genomes, where transcripts may often overlap in UTR regions.
- min numbler of read needed to glue two inchworm contigs together
Trinity示例
ln -s /BJPROJ/RNA/rna_test/TR_bioinfomatics1/prepare/sunfuming/lession5/trinitydata/re ads.*.fq . perl /PUBLIC/software/public/Assembly/trinityrnaseq_r20140413p1/Trinity \ --seqType fq \ --JM 2G \ --left reads.left.fq \ --right reads.right.fq\ --SS_lib_type RF \ --CPU 4 \
--min_kmer_cov 2 \ --min_glue 2 \ --full_cleanup \#删除中间文件
Trinity示例
• 输出结果
– Trinity.fasta文件 – unigene.fasta文件
从trinity.fasta中选择最长的转录本作为unigene,Trinity作者推荐
• --SS_lib_type (Strand-specific library type)
– Paired reads: • RF: first read(/1) of fragment pair is sequenced as antisense(reverse(R)), and second read(/2) is in the sense strand(forward(F)); typical of sequencing method. • FR: (reverse)
常用数据库介绍(NCBI,ENSEMBL)
step1:Inchworm
1.分解测序reads,构建k-mer字典 2.从k-mer字典中移除error-containing k-mer 3.选择seed k-mer 4.Seed k-mer 延伸,构成contig 5. 重复seed selection 和 bidirectional k-mer extension 直到k-mer 字 典耗尽 6. 过滤 contig
step2:Chrysalis
1.将contigs 组合成connected components 2. 将每个component构成一个de Bruijn graph 3.reads回比 4.过滤
step3:Butterfly
Butterfly resolves alternatively spliced and paralogous transcripts
转录组分析流程及常用软件使用方法
(无参,有参)
Novogene 孙福明
2015.1.12
无参转录组分析流程
有参分析流程
OUTLINE
拼 接(无参) 比对定量(RESM无参) 比对软件(Tophat2有参) ) 定量(HTSeq有参)
常用数据库介绍(NCBI,ENSEMBL)
OUTLINE
拼 接——Trinity(无参) 比对定量(RESM无参) 比对软件(Tophat2有参) ) 定量(HTSeq有参)
len: lΒιβλιοθήκη Baidungth of the transcript sequence
Trinity拼接质量评估
• N50/N90:按照长度将拼接转录本从大到小排序,累加转录本的 长度,到不小于总长50%/90%的拼接转录本的长度就是N50/N90。
OUTLINE
拼 接——Trinity(无参) 比对定量——RSEM(无参) 比对软件(Tophat2有参) ) 定量(HTSeq有参)
常用数据库介绍(NCBI,ENSEMBL)
Trinity参数
• --min_contig_length default:200 (step1,filter contigs)
– minimum assebled contig length to report
• --min_glue default:2 (step1,contigs to componets)
– Unpaired (single) reads: • F: the single read is in the sense (forward) orientation • R: the single read is in the antisense (reverse) orientation
Trinity拼接结果解读
• >c1_g1_i1 len=233 path=[94:0-232]
c1: sequence is derived from Chrysalis component 1 g1: sequence also corresponds to Butterfly subcomponent# 1 (during graph compaction and pruning, some components are partitioned into disconnected subcomponents). i1: sequence count from chrysalis component 1, butterfly subcomponent 1. If this subcomponent yields multiple sequences, these will have different seq numbers.
相关文档
最新文档