生物学常用软件简介

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

AC
accession number giving origin of sequence
DT
dates of entry and modification
KW
key cross-reference words for lookup up this entry
OS, OC source organism
RN, RP, RX, RA, RT, RL literature reference or source
DR
i. d. In other databases
CC
Description of biological function
பைடு நூலகம்
FH, FT information about sequence by base position or range of positiions
生物学常用软件简介
前言
生物信息学是一门新兴的交叉学科，它将数学和计算机知识应用于生物学，以获取、加工、存储、分类、检索与分析生物大分子的信息，从而理解这些信息的生物学意义。
上面是狭义的生物信息学含义,也是现阶段生物信息学的基本工作.
内容概要
一生物信息学软件的主要功能简介
1.数据的基本处理 2.序列的比对 3.基因/基因组的注释 4.Snp分析 5.进化分析 6.基因表达分析 7.蛋白质结构预测
2.序列的比对序列比对（alignment）：为确定两个或多个序列
之间的相似性以至于同源性，而将它们按照一定的规律排列。
将两个或多个序列排列在一起，标明其相似之处。序列中可以插入间隔（通常用短横线“-”表示）。对应的相同或相似的符号（在核酸中是A, T（或 U）, C, G，在蛋白质中是氨基酸残基的单字母表示）排列在同一列上。
3.1.3 HMMER
HMMER 是可以用来搜索使用统计模型或概要文件“隐马尔可夫模型”（HMM）的基因序列数据库的一个应用程序包。可以从处免费下载 HMMER 应用程序包。可以在独立的 HMMER 服务器上安装 HMMER 应用程序包，也可以在联合服务器上安装它。
Programs in HMMER
(1)全局比对
I clustal w(多序列全局比对）
CLUSTAL是一种渐进的比对方法，先将多个序列两两比对构建距离矩阵，反应序列之间两两关系；然后根据距离矩阵计算产生系统进化指导树，对关系密切的序列进行加权；然后从最紧密的两条序列开始，逐步引入临近的序列并不断重新构建比对，直到所有序列都被加入为止。现在的版本是clustal w2
hmmsearch Search a sequence database for matches to a single pro.
The other programs in the package are:
hmmalign Align sequences to an existing model.
(II)序列拼接（cap3)
CAP sequence can do: 1. Use of forward-reverse constraints to correct assembly errors and link contigs. 2. Use of base quality values in alignment of sequence reads. 3. Automatic clipping of 5' and 3' poor regions of reads. 4. Generation of assembly results in ace for Consed. 5. CAP3 can be used in GAP4 of the Staden package.
atatactcac agcataactg tatatacacc cagggggcgg aatgaaagcg ttaacggcca 120
.
.
// symbol to indicate end or sequence
EMBL格式
(2)峰图转化(phred) Phred是phred\phrap软件包的一部分，主要是用来分
Currently, the HMMER package contains nine programs. Two of these are programs for database searching:
hmmpfam Search an HMM database for matches to a query sequence.
hmmfetch Get a single model from an HMM database.
hmmindex Index an HMM database.
(2)局部比对
I blast: 基于局部比对算法的搜索工具，可用于核酸
和蛋白质序列的局部比对。最新的blast还可以检索pcr引物
mutation sequence position, change in sequence for mutation
SQ
count of A, C, G, T and other symbols
gaattcgata aatctctggt ttattgtgca gtttatggtt ccaaaatcgc cttttgctgt 60
Clust w2可以用于核酸或蛋白质的多序列比对，也可以用来构建系统进化树。它的使用方式可以在线使用，也可以使用email使用。
II MUSCLE
MUSCLE是一个开源软件，它的作用是可以对蛋白质和核酸进行多序列比对，在运行速度和精度上都比clustal w要好，它可以在网络上运行，也可以下载到本地运行。
Dot plot of a cross_match comparison of strains MGAS8232 and SF370 genome sequences. cross_match was run with default parameters except the minimum match was set to 100
II genwise
source range of sequence, source organism
misc_signal range of sequence, type of function or signal
mRNA range of sequence, mRNA
CDS range of sequence, position of intron
（4）载体屏蔽(cross_match)
它是phrap软件的一部份，用于比对两套DNA 序列，要求输入fasta格式的数据，输出的内容可以有三种：日志、被屏蔽了相应序列后的序列文件（也是用fasta格式），标准屏幕输出。
Cross_match is a general purpose utility for comparing any two DNA sequence sets using the Smith-Waterman algorithm. For example, it can be used to compare a set of reads to a set of vector sequences and produce vector-masked versions of the reads, a set of cDNA sequences to a set of cosmids, contig sequences found by two alternative assembly procedures (for example, phrap and xbap) to each other, or phrap contigs to the final edited cosmid sequence. It is slower but more sensitive than BLAST.
二.生物学软件部分常见功能使用技巧
PCR 引物设计 DNA、蛋白质序列同源分析及进化树构建 Contig Express----DNA 序列片断拼接 DNA 模拟电泳
三生物信息学软件的系统平台
生物信息学软件一般可以分成商业的和开源的两大类,大部份商业的软件都是用在 windows平台下的,而大部分开源软件是在 unix/linux平台下的.
hmmconvert Convert a model different formats, including a compact HMMER 2 binary format, and "best effort" emulation of GCG profiles.
hmmemit Emit sequences probabilistically from a pro.
Smoot J. C. et.al. PNAS 2002;99:4668-4673
（5）序列的聚类拼接 I 序列组装(phrap)
phrap is a program for assembling shotgun DNA sequence data. Among other features, it allows use of the entire read and not just the trimmed high quality part, it uses a combination of user-supplied and internally computed data quality information to improve assembly accuracy in the presence of repeats, it constructs the contig sequence as a mosaic of the highest quality read segments rather than a consensus, it provides extensive assembly information to assist in trouble-shooting assembly problems, and it handles large datasets.
大部分的软件基于unix/linux平台.
一生物信息学软件的主要功能简介
1.数据的基本处理 (1)数据的常用格式: 生物信息学中数据的常用格式有: Fasta、NBRF/PIR，EMBL、CLUSRAL、
Genbank、phylip等。这些格式虽然不同，但用一些软件可以进行
转换，下面一起看一下Fasta和EMBL
hmmbuild Build a model from a multiple sequence alignment.
hmmcalibrate Takes an HMM and empirically determines parameters that are used to make searches more sensitive, by calculating more accurate expectation value scores (E-values).
析和装配基因组中大片段序列。 phred能处理测序仪直接生成的色谱图，并且产生相关的信息。 phred\phrap软件包由华盛顿大学分子生物技术学院的Phil Green和Brent Ewing开发，主要用于学术科研活动。官方网站：
中文教程：
(3)文件转换(phd2fasta)
作用：把phred或phrap的计算结果转换成fasta 格式软件的主页：
FASTA格式又称Pearson的格式，该序列格式要求序列的标题行以大于号">"开头，下一
行起为具体的序列。一般建议每行的字符数不超过60个，以方便程序处理。多条核苷酸序列格式即将该格式连续列出即可
ID
identification code for sequence in the database