序列相似性搜索
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
5’ GTG GGT 5’ TGG GTA 5’ GGG TAG
Step 3: choose the database
nr = non-redundant (most general database) dbest = database of expressed sequence tags dbsts = database of sequence tag sites gss = genomic survey sequences htgs = high throughput genomic sequence
Step 4a: Select optional search parameters
CD search
BLAST N searching
Step 4a: Select optional search parameters
Entrez! Filter Expect Word size
增加该值可提高 查询速度
BLAST format options: multiple sequence alignment
We will get to the bottom of a BLAST search in a few minutes…
EVD parameters
BLOSUM matrix gap penalties 10.0 is the E value Effective search space = mn = length of query x db length threshold score = 11
tblastx (translated BLAST)
Choose the BLAST program Program Input
1 blastn blastp blastx tblastn DNA DNA
Database 1
protein 6 DNA
protein protein
6
protein 36 DNA
Four components to a BLAST search
(1) Choose the sequence (query)
(2) Select the BLAST program (3) Choose the database to search (4) Choose optional parameters Then click “BLAST”
540
Query 539
DAKKCIAMAPHVEVESRVAPSFNQEDIYITTESLTTTAGRSGTAECAPSSEMPVPDYTSI DAKKCIA+APHVE ES PSFNQEDIYITTESLTTTAGRSGTAE PSSEMPVPDYTSI
598
Sbjct 541
DAKKCIALAPHVEAESHAEPSFNQEDIYITTESLTTTAGRSGTAERVPSSEMPVPDYTSI
三、序列的BLAST分析
BLAST
BLAST (Basic Local Alignment Search Tool) allows rapid sequence comparison of a query sequence against a database. The BLAST algorithm is fast, accurate, and webaccessible. 基本局域联配搜寻工具
二、同源和相似
1. 同源(homology)- 具有共同的祖先
直向同源(Orthologous ) 共生同源(paralogous )
2.相似(similarity)
同源序列一般是相似的
相似序列不一定是同源的
一般认为,蛋白质序列间至少有 80个氨基酸左右的区 域有25%或更高的同源性;DNA序列具有 75%以上的 同源性有潜在的生物学意义。
organism
BLAST: optional parameters
You can... • choose the organism to search • turn filtering on/off • change the expect (e) value • change the word size • change the output format
Query 421
ADLLCLDQKNQNNSPSNDAAPATQQPSVILAEENKPRPLIISGTDSTHQTAHT--QLSNP AD LCLDQKN NNSPSNDAAP +QQPSV+L EENKPR L+ GT+STHQ HT QLSNP
478
Sbjct 421
ADRLCLDQKNLNNSPSNDAAPDSQQPSVLLGEENKPRSLLTGGTESTHQAGHTQQQLSNP
第六章 序列相似性搜索
一、序列相似性搜索的任务和目的
1. 序列相似性搜索的任务 2. 序列相似性搜索的目的
二、同源和相似 三、序列的BLAST分析 四、专门的BLAST服务器
一、序列相似性搜索的任务和目的
1. 序列比较的任务:
发现序列之间的相似性 辨别序列之间的差异
2. 目的:
相似序列 相似的结构,相似的功能 判别序列之间的同源性 推测序列之间的进化关系
Query 119 CCGGCTGTGATGGCTGCAGGCCCTCGGACCTCCGTGCTCCTGGCTTTCGCCCTGCTCTGC 178 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 116 CCGGCTGTGATGGCTGCAGGCCCTCGGACCTCCGTGCTCCTGGCTTTCGCCCTGCTCTGC 175 Query 179 CTGCCCTGGACTCAGGAGGTGGGCGCCTTGGGAGCCATGCCCTTGTCCAGCCTATTTGCC 238 ||||||||||||||||||||||||||||| |||||||||||||||||||||||||||| Sbjct 176 CTGCCCTGGACTCAGGAGGTGGGCGCCTTCCCAGCCATGCCCTTGTCCAGCCTATTTGCC 235 Query 239 AACGCCGTGCTCCGGGCCCAGCACCTGCACCAACTGGCTGCCGACACCTACAAGGAGTTT 298 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 236 AACGCCGTGCTCCGGGCCCAGCACCTGCACCAACTGGCTGCCGACACCTACAAGGAGTTT 295
600
Query 599
HIVQSPQGLVLNATALPLPDKEFLSSCGYVSTDQLNKIMP 638 HIVQSPQGLVLNATALPLPDKEFLSSCGYVSTDQLNKIMP
Sbjct 601
HIVQSPQGLVLNATALPLPDKEFLSSCGYVSTDQLNKIMP 640
tblastx
DNA
DNA
DNA potentially encodes six proteins
5’ CAT CAA 5’ ATC AAC 5’ TCA ACT 5’ CATCAACTACAACTCCAAAGACACCCTTACACATCAACAAACCTACCCAC 3’ 3’ GTAGTTGATGTTGAGGTTTCTGTGGGAATGTGTAGTTGTTTGGATGGGTG 5’
Step 1: Choose your sequence
Sequence can be input in FASTA format, plain text format or as accession number
Example of the FASTA format for a BLAST query
Step 2: Choose the BLAST program
Step 2: Choose the BLAST program
blastn (nucleotide BLAST)
blastp (protein BLAST)
blastx (translated BLAST)
tblastn (translated BLAST)
BLAST searching is fundamental to understanding the relatedness of any favorite query sequence to other known proteins or DNA sequences.
Applications include • identifying orthologs and paralogs • discovering new genes or proteins • discovering variants of genes or proteins • investigating expressed sequence tags (ESTs) • exploring protein structure and function
filtering
Step 4b: optional formatting parameters
Alignment view Descriptions Alignments
program
query database taxonomy
taxonomy
BLAST format options
480
Query 479
SSLANIDFYAQVSDITPAGSVVLSPGQKNKAGISQCDMHLEVVSPCPANFIMDNAYFCEA SSLANIDFYAQVSDITPAGSVVLSPGQKNKAG+SQCDMH EVVSPC ANFIMDNAYFCEA
538
Sbjct 481
SSLANIDFYAQVSDITPAGSVVLSPGQKNKAGMSQCDMHPEVVSPCQANFIMDNAYFCEA
cut-off parameters
BLASTP Searching with a multidomain protein, pol
Searching bacterial sequences with pol
BLAST program selection guide
Pig growth hormone mRNA Sequence ID: gb|M22761.1|PIGGHMALength: 878Number of Matches:
Website of BLAST http://www.ncbi.nlm.nih.gov/BLAST/ (BLAST2.0) http://www2.ebi.ac.uk/blast2/ (WU-Blast2)
httpHale Waihona Puke Baidu//blast.wustl.edu/ (WU-Blast2)
Why use BLAST?
Query 1 Sbjct 1 Query 61 Sbjct 61 tttttttttttGGTGGGGAAGAGGACTTTTATTGGGATGTTAGTGGGGGACTCCAGGGAA 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| TTTTTTTTTTTGGTGGGGAAGAGGACTTTTATTGGGATGTTAGTGGGGGACTCCAGGGAA 60 CA-C-AACACTAGGACCCAGCTCCCCAGACCACTCAGGGACCTGTGGACAGCTCAGCTCA 118 || | ||||||||||||||||||||||||||||||||||||||||||| ||||||| CAACAAACACTAGGACCCAGCTCCCCAGACCACTCAGGGACCTGTGGA-----CAGCTCA 115