NCBI网站BLAST使用方法介绍完整版
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
? RPS BLAST
– searches a database of PSSMs – tool for conserved domain searches
Basic Local Alignment Search Tool
? Widely used similarity search tool
? Heuristic approach based on
Basic Local Alignment Search Tool
?Why use sequence similarity? ?BLAST algorithm ?BLAST statistics ?BLAST output ?Examples
Why Do We Need Sequence Similarity Searching?
DNA Polymerase Replication
N
N OPOPOPO
O
H
H
H OH
H H
N
N OPOPOPO
O
H
H
H H
H H
NH2 N
N
NH2 N
N
传统分子技术必然会让位于 BLAST为主的生物信息技术
Sanger's ddNTP Sequencing
What does this sequence mean?
: NCBI's tool
科学的方法:可以认我们研究我们不懂的数据!——比较的方法
BLAST and Molecular Evolution
3000 Myr
BLAST Screening
1000 Myr
先找到相似的序列
540 Myr
再找出相似序列间的关系
MLH1
M来自百度文库tL
Human
Fly
Pancreatic carcinoma
? Ungapped extensions of hits (initial HSPs)
? Gapped extensions (no traceback)
? Gapped extensions details)
(traceback; alignment
Nucleotide Words
Query : GTACTGGACATGGACCCTACAGGAA
ACATGGACCCT ...
Protein Words
Query : GTQITVEDLFYNIATRRKALKN
WGoTrdQsize = 3 (default)
TQI
Word size can only be 2 or 3
Make a lookup table of words
QIT ITV
Neighborhood Words
LTV, MTV, ISV, LSV, etc.
TVE VED
[ -f 11 = blastp default ]
EDL
DLF
...
Minimum Requirements for a Hit
ATCGCCATGCTTAATTGGGCTT CATGCTTAATT one exact match
AACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAAACCCTAAACCCTAACCCTAACCCTAACC ACCCTAACCCCAACCCCAACCCCAACCCCAACCCCAACCCCAACCCTAACCCCTAACCCTAACCCTA CTACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCCTAACCCTAACCCTAACC ACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTCGCGGTACCCTCAGCCGGC CCCGCCCGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAGAGTACCACCGAAATCTGTGCAGAG AACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGGCGCAG CAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGACACATGCTAGCGCGTCGGGGTGGAGGCGTGGCGC CGCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGACACATGCTACCGCGTCCAGGGGTGGAGGCG CGCAGGCGCAGAGAGGCGCACCGCGCCGGCGCAGGCGCAGAGACACATGCTAGCGCGTCCAGGGGTG GCGTGGCGCAGGCGCAGAGACGCAAGCCTACGGGCGGGGGTTGGGGGGGCGTGTGTTGCAGGAGCAA CGCACGGCGCCGGGCTGGGGCGGGGGGAGGGTGGCGCCGTGCACGCGCAGAAACTCACGTCACGGTG CGGCGCAGAGACGGGTAGAACCTCAGTAATCCGAAAAGCCGGGATCGACCGCCCCTTGCTTGCAGCC CACTACAGGACCCGCTTGCTCACGGTGCTGTGCCAGGGCGCCCCCTGCTGGCGACTAGGGCAACTGC GCTCTCTTGCTTAGAGTGGTGGCCAGCGCCCCCTGCTGGCGCCGGGGCACTGCAGGGCCCTCTTGCT TGTATAGTGGTGGCACGCCGCCTGCTGGCAGCTAGGGACATTGCAGGGTCCTCTTGCTCAAGGTGTA GCAGCACGCCCACCTGCTGGCAGCTGGGGACACTGCCGGGCCCTCTTGCTCCAACAGTACTGGCGGA TAGGGAAACACCCGGAGCATATGCTGTTTGGTCTCAGTAGACTCCTAAATATGGGATTCCTGGGTTT AGTAAAAAATAAATATGTTTAATTTGTGAACTGATTACCATCAGAATTGTACTGTTCTGTATCCCAC CAATGTCTAGGAATGCCTGTTTCTCCACAAAGTGTTTACTTTTGGATTTTTGCCAGTCTAACAGGTG CCCTGGAGATTCTTATTAGTGATTTGGGCTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTG ATTTTGCTGCATGGCCGGTGTTGAGAATGACTGCGCAAATTTGCCGGATTTCCTTTGCTGTTCCTGC TAGTTTAAACGAGATTGCCAGCACCGGGTATCATTCACCATTTTTCTTTTCGTTAACTTGCCGTCAG
CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCCTAACCC
Human genome statistics CCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCCTAACCCTAAC
Text
Wang LS, Gao PJ, cellulase,et al.
Entrez
Sequence
BLAST
Structure
VAST
Bioinfom
blast.ncbi.nlm.nih.gov
ENTER Sequences
Here
计算机怎么会读我们读不懂的数据? TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
Worm
Alzheimer's Disease
Yeast
Ataxia telangiectasia
Bacteria
Colon cancer
如何找出序列间的相似性?
Seq 1 Seq 2
Global alignment
Seq 1 Seq 2
Local alignment
Global vs Local Alignment
blastx, tblastn, tblastx)
? Megablast
– optimized for large batch searches – can use discontiguous words
? PSI-BLAST
– constructs PSSMs automatically; uses as query – very sensitive protein search
– DNA translation vs Protein
? blastx
– Protein vs Protein
? blastp
– Protein vs DNA translation
? tblastn
– DNA translation vs DNA translation
? tblastx
? www, standalone, and network clients
Query
Database
blast x tblast n tblast x
N
PPP PPP
P
N
PPP PPP
P
PPP N PPP
N
PPP PPP
How BLAST Works
? Make lookup table of “words”for query
? Scan database for hits
Global vs Local Alignment
Seq1: WHEREISWALTERNOW
(16aa)
Seq2: HEWASHEREBUTNOWISHERE (21aa)
Global
Seq1: 1
W--HEREISWALTERNOW 16
W HERE
Seq2: 1 HEWASHEREBUTNOWISHERE
限
制
目标基因
酶
重组 基因
传统分子生 现代生物信
物学方法
息学方法
BLAST
宿主菌
细胞转化
几周的时间 蛋白质分离纯化及性质测定
Gene family Or
Protein Family
几分钟的时间
Function annotation
www.ncbi.nlm.nih.gov
BLAST
Web Access
BLAST
Basic Local Alignment Search Tool
Lushan Wang 2010.11.24
生物信息的获取方式
? 1、以生物学信息为主检索数据——Entrez ? 2、以序列为主检索相关信息——BLAST ? 生物信息学时代BLAST相当于分子生物学
进代的“PCR”技术
21
Seq1: 1 W--HERE 5 W HERE
Seq2: 3 WASHERE 9
Local
Seq1: 1 W--HERE 5 W HERE
Seq2: 15 WISHERE 21
The Flavors of BLAST
? Standard BLAST
– traditional “contiguous” word hit – position independent scoring – nucleotide, protein and translations (blastn, blastp,
? To identify and annotate sequences ? To evaluate evolutionary relationships ? Other:
– model genomic structure (e.g., Spidey) – check primer specificity in silico
Smith Waterman
? Finds best local alignments
? Provides statistical significance
? All combinations (DNA/Protein) query and database
– DNA vs DNA ? blastn
11-mer
GTACTGGACAT
WORD SIZE
default
minimum
Make a lookup TACTGGACATG
table of words
blastn
11
ACTGGACATGG
7
mCegTabGlasGt ACATG2G8 A
8
TGGACATGGAC
GGACATGGACC
GACATGGACCC
algorithm .
Nucleotide Translated BLAST P rotein
Particularly useful for nucleotide sequences without protein annotations, such as ESTs or genomic DNA
Program
– searches a database of PSSMs – tool for conserved domain searches
Basic Local Alignment Search Tool
? Widely used similarity search tool
? Heuristic approach based on
Basic Local Alignment Search Tool
?Why use sequence similarity? ?BLAST algorithm ?BLAST statistics ?BLAST output ?Examples
Why Do We Need Sequence Similarity Searching?
DNA Polymerase Replication
N
N OPOPOPO
O
H
H
H OH
H H
N
N OPOPOPO
O
H
H
H H
H H
NH2 N
N
NH2 N
N
传统分子技术必然会让位于 BLAST为主的生物信息技术
Sanger's ddNTP Sequencing
What does this sequence mean?
: NCBI's tool
科学的方法:可以认我们研究我们不懂的数据!——比较的方法
BLAST and Molecular Evolution
3000 Myr
BLAST Screening
1000 Myr
先找到相似的序列
540 Myr
再找出相似序列间的关系
MLH1
M来自百度文库tL
Human
Fly
Pancreatic carcinoma
? Ungapped extensions of hits (initial HSPs)
? Gapped extensions (no traceback)
? Gapped extensions details)
(traceback; alignment
Nucleotide Words
Query : GTACTGGACATGGACCCTACAGGAA
ACATGGACCCT ...
Protein Words
Query : GTQITVEDLFYNIATRRKALKN
WGoTrdQsize = 3 (default)
TQI
Word size can only be 2 or 3
Make a lookup table of words
QIT ITV
Neighborhood Words
LTV, MTV, ISV, LSV, etc.
TVE VED
[ -f 11 = blastp default ]
EDL
DLF
...
Minimum Requirements for a Hit
ATCGCCATGCTTAATTGGGCTT CATGCTTAATT one exact match
AACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAAACCCTAAACCCTAACCCTAACCCTAACC ACCCTAACCCCAACCCCAACCCCAACCCCAACCCCAACCCCAACCCTAACCCCTAACCCTAACCCTA CTACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCCTAACCCTAACCCTAACC ACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTCGCGGTACCCTCAGCCGGC CCCGCCCGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAGAGTACCACCGAAATCTGTGCAGAG AACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGGCGCAG CAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGACACATGCTAGCGCGTCGGGGTGGAGGCGTGGCGC CGCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGACACATGCTACCGCGTCCAGGGGTGGAGGCG CGCAGGCGCAGAGAGGCGCACCGCGCCGGCGCAGGCGCAGAGACACATGCTAGCGCGTCCAGGGGTG GCGTGGCGCAGGCGCAGAGACGCAAGCCTACGGGCGGGGGTTGGGGGGGCGTGTGTTGCAGGAGCAA CGCACGGCGCCGGGCTGGGGCGGGGGGAGGGTGGCGCCGTGCACGCGCAGAAACTCACGTCACGGTG CGGCGCAGAGACGGGTAGAACCTCAGTAATCCGAAAAGCCGGGATCGACCGCCCCTTGCTTGCAGCC CACTACAGGACCCGCTTGCTCACGGTGCTGTGCCAGGGCGCCCCCTGCTGGCGACTAGGGCAACTGC GCTCTCTTGCTTAGAGTGGTGGCCAGCGCCCCCTGCTGGCGCCGGGGCACTGCAGGGCCCTCTTGCT TGTATAGTGGTGGCACGCCGCCTGCTGGCAGCTAGGGACATTGCAGGGTCCTCTTGCTCAAGGTGTA GCAGCACGCCCACCTGCTGGCAGCTGGGGACACTGCCGGGCCCTCTTGCTCCAACAGTACTGGCGGA TAGGGAAACACCCGGAGCATATGCTGTTTGGTCTCAGTAGACTCCTAAATATGGGATTCCTGGGTTT AGTAAAAAATAAATATGTTTAATTTGTGAACTGATTACCATCAGAATTGTACTGTTCTGTATCCCAC CAATGTCTAGGAATGCCTGTTTCTCCACAAAGTGTTTACTTTTGGATTTTTGCCAGTCTAACAGGTG CCCTGGAGATTCTTATTAGTGATTTGGGCTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTG ATTTTGCTGCATGGCCGGTGTTGAGAATGACTGCGCAAATTTGCCGGATTTCCTTTGCTGTTCCTGC TAGTTTAAACGAGATTGCCAGCACCGGGTATCATTCACCATTTTTCTTTTCGTTAACTTGCCGTCAG
CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCCTAACCC
Human genome statistics CCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCCTAACCCTAAC
Text
Wang LS, Gao PJ, cellulase,et al.
Entrez
Sequence
BLAST
Structure
VAST
Bioinfom
blast.ncbi.nlm.nih.gov
ENTER Sequences
Here
计算机怎么会读我们读不懂的数据? TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
Worm
Alzheimer's Disease
Yeast
Ataxia telangiectasia
Bacteria
Colon cancer
如何找出序列间的相似性?
Seq 1 Seq 2
Global alignment
Seq 1 Seq 2
Local alignment
Global vs Local Alignment
blastx, tblastn, tblastx)
? Megablast
– optimized for large batch searches – can use discontiguous words
? PSI-BLAST
– constructs PSSMs automatically; uses as query – very sensitive protein search
– DNA translation vs Protein
? blastx
– Protein vs Protein
? blastp
– Protein vs DNA translation
? tblastn
– DNA translation vs DNA translation
? tblastx
? www, standalone, and network clients
Query
Database
blast x tblast n tblast x
N
PPP PPP
P
N
PPP PPP
P
PPP N PPP
N
PPP PPP
How BLAST Works
? Make lookup table of “words”for query
? Scan database for hits
Global vs Local Alignment
Seq1: WHEREISWALTERNOW
(16aa)
Seq2: HEWASHEREBUTNOWISHERE (21aa)
Global
Seq1: 1
W--HEREISWALTERNOW 16
W HERE
Seq2: 1 HEWASHEREBUTNOWISHERE
限
制
目标基因
酶
重组 基因
传统分子生 现代生物信
物学方法
息学方法
BLAST
宿主菌
细胞转化
几周的时间 蛋白质分离纯化及性质测定
Gene family Or
Protein Family
几分钟的时间
Function annotation
www.ncbi.nlm.nih.gov
BLAST
Web Access
BLAST
Basic Local Alignment Search Tool
Lushan Wang 2010.11.24
生物信息的获取方式
? 1、以生物学信息为主检索数据——Entrez ? 2、以序列为主检索相关信息——BLAST ? 生物信息学时代BLAST相当于分子生物学
进代的“PCR”技术
21
Seq1: 1 W--HERE 5 W HERE
Seq2: 3 WASHERE 9
Local
Seq1: 1 W--HERE 5 W HERE
Seq2: 15 WISHERE 21
The Flavors of BLAST
? Standard BLAST
– traditional “contiguous” word hit – position independent scoring – nucleotide, protein and translations (blastn, blastp,
? To identify and annotate sequences ? To evaluate evolutionary relationships ? Other:
– model genomic structure (e.g., Spidey) – check primer specificity in silico
Smith Waterman
? Finds best local alignments
? Provides statistical significance
? All combinations (DNA/Protein) query and database
– DNA vs DNA ? blastn
11-mer
GTACTGGACAT
WORD SIZE
default
minimum
Make a lookup TACTGGACATG
table of words
blastn
11
ACTGGACATGG
7
mCegTabGlasGt ACATG2G8 A
8
TGGACATGGAC
GGACATGGACC
GACATGGACCC
algorithm .
Nucleotide Translated BLAST P rotein
Particularly useful for nucleotide sequences without protein annotations, such as ESTs or genomic DNA
Program