lecture-28(宾夕法尼亚大学二代测序数据分析教程)

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

You$shall$not$invent$a$new$weakly$defined,$internally$ redundant,$ambiguous,$bulky$fruit$salad$of$a$data$format.$ Again.$$ $ From%the%Biostar%thread:%What'are'the'most'common' stupid'mistakes'in'bioinforma:cs?'
Import%the%transcript%as%your%reference%genome%
Strategy%2% “Naïve”%non&spliced%alignment%against% genome%%
• Long%exons%will%be%covered%by%many%reads% • The%shorter%the%exon%the%fewer%reads%can%fit% on%it,%on%the%extreme%they%will%be%skipped% altogether% • Abundance%es<ma<on%is%strongly%affected%
Align%against%the%known%transcriptome%
Simplest%approach%(somewhat%frowned%upon):% % • Extract%transcripts%sequences%and%treat%them% as%the%“reference”%% • Use%an%aligner%to%align%against%this%reference% • Post&process%the%results%%
Download%and%install%tophat%and%cufflinks'
% • The%so%called%Tuxedo%suite%is%a%consistently%good%performer,%% tools%that%work%immediately%with%minimal%fuss%and%therefore%% %
2013%&%BMMBห้องสมุดไป่ตู้597D:%Analyzing%Next%Genera<on%Sequencing%Data%
% %Week%14,%Lecture%28%
István'Albert' '
Biochemistry%and%Molecular%Biology%% and%Bioinforma<cs%Consul<ng%Center% % Penn%State%
Align%with%tophat%
Mapping%with%TopHat%
Simula<ng%realis<c%reads%
Everyone%inves<ng%into%RNA&Seq%analysis%needs%to%start%here.%It%explains%in%great%detail%the% source%of%experimental%bias%and%measurement%error.% % Simulate%and%analyze%known%transcripts%of%your%genome%of%interest%before%diving%into% analyzing%real%data.%
Run%the%flux%simulator%
• Turn%the%transcript%into%GFF%into%GTF% • Put%the%references%into%a%separate%file%per%each% chromosome% • Simulate%each%step:% %
The%tools%will%simulate%
• gene%expression%% • fragmenta<on%process:%enzyma<c%diges<on,% nebulisa<on%or%hydrolysis% • %reverse%transcrip<on% • size%selec<on% • adapter%liga<on%and%PCR%amplifica<on% • sequencing%errors%
RNA&Seq%strategies%
• Align%against%a%known%transcriptome:%%
– good:%efficient,%well%defined%answers% – bad:%unable%to%discover%novel%transcripts,%may%align% reads%that%would%map%beVer%in%noncoding%regions%
This%already%exhibits%the%underlying%structure%–% too%many%errors%though%
Strategy%3% using%splice%aware%alignment%%
% • Usually%requires%secondary%informa<on%on% where%the%splices%occur% • If%that%informa<on%is%not%available%then%the% performance%may%degrade%greatly%
– cummeRbund%!%R%package%to%facilitate%RNA&Seq%analysis%
Exercise:%produce%the%sequence%%for%the% transcript%
Strategy%1% Align%against%the%transcriptome%
• Align%against%genome:%
– good:'discover%novel%transcripts% – bad:%more%false%posi<ves,%more%uncertainty% Many%methods%try%to%make%use%of%a%combina<on%of%both%
Align%RNA&seq%reads%against%genome%
Typical%internal%implementa<on%% % 1. Separate%reads%(read&pairs)%that%map%“correctly”%to%exonic% loca<ons% % 2. Process%“incorrectly”%mapped%reads:%
– bow:e%!%short%read%mapper% % – tophat%!%RNA&seq%mapping%
– cufflinks%!%isoform%assembly%and%quan<fica<on% %
• cuffdiff%!%establish%differen<al%expression% • Cuffcompare'!%compare%assembled%transcripts%agains%reference% • cuffmerge'!'merge%experiments%
Flux&Generator%
• Posi:ve:'responsive%authors,%good%underlying%concepts% • Nega:ve:''
1. it%is%more%complicated%than%it%should%be,%% 2. invents%its%own%data%formats%
– Distant%pairs%could%indicate%the%presence%of%an%intron% %
3. Re&align%reads%that%did%not%map%in%step%1%to%“poten<al”%junc<on% sites%
– create%a%puta<ve%transcriptome%by%fusing%sequences%at%the%border%of% mapped%reads% – iden<fy%intron%splicing%indictor%base%pairs:%GT%&&&%AG,%etc.% – train%machine%learning%algorithms%to%predict%junc<on%sites%
– Simulate%library%construc<on% – Simulate%gene%expression% – Simulate%sequencing%
Simula<on%Parameters%
Simula<on%results%
Homework%28%
• Simulate%reads%from%a%transcriptome%with%wgsim% or%flux&sim% • Use%tophat%to%align%the%reads% • Show%the%commands%that%you%ran%and%a% screenshow%of%the%alignments.%% • How%many%reads%cover%each%of%your%exons?%
相关文档
最新文档