基因预测总结

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

1、基因预测

对于真菌来说有四个ab initio预测软件:

GlimmerHMM,SNAP,Genearkes,augustus 以及同源预测(homology)。四个软件中:GeneMarkes是通过隐马模型工作的,但是它不需要参考物种,是自身训练的,不需要参考序列,当处理一个新物种,没有理想的或者较近缘的已测序物种时可以采用这种方法。Augustus,GlimmerHMM,SNAP都需要参考训练集的。

总流程:

perl /nas/MG01/FUNGUS/PGAP/FGAP.pl [options] Genome.fa

Options

--all run all analysis for Fungi

--cutlen cut the scaffolds longer than this

--predict select the method to predict genes:

augustus,genemarkes,snap,glimmerhmm or homology

--prepara set the parament for augustus,snap,homology

--repeat set repeat method, defalut: repbase-proteinmasker-trf

--ncRNA set ncRNA type, default: tRNA-rRNA-miRNA-sRNA-snRNA

--rRNA_ref set Reference for rRNA, if null rRNA will be predicted by rRNAmmer

--function set dbs for gene function annotaion,default:

nr-swissprot-trembl-cog-kegg-iprscan

--lib set the lib for synteny analysis and gene family analysis, needed

--synteny synteny analysis

--family Gene Family analysi

--species species tree, default, created by lib information

--category category file, default, created by lib information

--cpu set the cpu number to use in parallel, default 20 for qsub and 5 for multi --run set the parallel type, qsub, or multi, default=qsub

--outdir set the result directory, default="."

--prefix set a prefix name for results

--help output help information to screen

分步流程程序路径:/nas/MG01/FUNGUS/PGAP/gene-prediction/bin/gene-predict.pl

perl gene-predict.pl [options]

--glimmer run glimmer by self training

--genemark run genemark by self training

--shape set the shape of prokaryote DNA, circular,linear,partial, default=partial --glimmerhmm run glimmerhmm and give a glimmerhmm parameter directory

--snap run snap and give a snap parameter file

--genemarkes run genemarkes by self traning

--augustus run augustus and set species

--homology predict genes based on proteins on a homology species

--genemarkM run genemarkM for mata gene prediction

--metagene run metagene for meta gene prediction

--metageneA run metageneA for meta gene prediction

--cpu set the cpu number to use in parallel, default=3

--run set the parallel type, qsub, or multi, default=qsub

--prefix set gene id prefix

--outdir set the result directory, default="./"

--verbose output running progress information to screen

--help output help information to screen

1.1Genemarkes预测:

Self-training algorithm GeneMark-ES

a) splits input sequence at such "NN...N" strings

b) runs gene finding GeneMark.hmm on contigs

c) maps back predictions to original super-contig sequence As a result, incomplete gene structures can be predicted inside super-contig sequences.

Script:perl ./gene-predict.pl --genemarkes

GeneMarkES 输出结果为./genemark_hmm.gtf

1.2 Homology预测

Homology(同源预测)是通过基因组序列和参考蛋白集进行比对来确定基因位置的,预测的结果特点是基因数目少,但是准确率很高。通过genewise软件预测,用此方法需要提供近缘物种的氨基酸序列,由于同源预测的结果比真实的少,所以建议大家再用此方法时不要只找一个参考物种,多找几个近缘物种,尽量能多预测出一些基因来。

perl ./gene-predict.pl --homology ../input/CNGH2S.fa.pep -prefix RHO

*.pep 序列为参考序列的蛋白序列。

1. 3 SNAP预测

相关文档
最新文档