动态规划法——双序列比对(精)

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

5
3 /55
Sequence Alignment
4
Outline
• Global Alignment • Scoring Matrices • Local Alignment
• Alignment with Affine Gap Penalties
5 /55
From LCS to Alignment: Change up the Scoring
10 /55
Measuring Similarity
• Measuring the extent of similarity between two sequences • Based on percent sequence identity • Based on conservation
11 /55
• This will simplify the algorithm as follows:
si , j
si 1, j 1 Vi , W j max si 1, j , W j s i , j 1 Vi ,
9 /55
The Blosum62 Scoring Matrix
习题
• 4,求两条序列的最长共同子序列。【作业】
v = TACGGGTAT w = GGACGTACG
2 /55
G
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
G
2 0
A
3 0
C
4 0
G
5 0
T
6 0
A
7
C
8 0
G
9 0
T A C G G G T A T
1 2 3 4 5 6 7 8 9
• To generalize scoring, consider a (4+1) x(4+1) scoring matrix δ. • In the case of an amino acid sequence alignment, the scoring matrix would be a (20+1)x(20+1) size. The addition of 1 is to include the score for comparison of a gap character “-”.
Percent Sequence Identity
• The extent to which two nucleotide or amino acid sequences are invariant
AC C TG A G – AG AC G TG – G C AG
mismatch indel
70% identical
13 /55
Scoring Matrix: Example
A A 5 R -2 N -1 K -1
7 /55
The Global Alignment Problem
• Find the best alignment between two strings under a given scoring schema • Input : Strings v and w and a scoring schema • Output : Alignment of maximum score
si , j
si 1, j 1 1 if Vi W j s i 1, j 1 - m if Vi W j max si 1, j si , j 1
m : mismatch penalty
σ : indel penalty
8 /55
Scoring Matrices
6 /55
Simple Scoring
• When mismatches are penalized by –μ, indels are penalized by –σ, and matches are rewarded with +1, the resulting score is: #matches – μ(#mismatches) – σ (#indels)
12 /55
Making a Scoring Matrix
• Scoring matrices are created based on biological evidence. • Alignments can be thought of as two sequences that differ due to mutations. • Some of these mutations have little effect on the protein’s function, therefore some penalties, δ(vi , wj), will be less harsh than others.
回顾
• Dynamic Programming
• Edit Distance(编辑距离)
• Alignment(比对) • Directed Acyclic Graph
Fra Baidu bibliotek
• Edit Graph
• Backtracking
- T G C A T - A - C AT - C - TGA TC
1 /55
• The Longest Common Subsequence (LCS) problem—the simplest form of sequence alignment – allows only insertions and deletions (no mismatches). • In the LCS Problem, we scored 1 for matches and 0 for indels • Consider penalizing indels and mismatches with negative scores • Simplest scoring schema: - T G C A T - A - C +1 : match premium AT - C - TGA TC -μ : mismatch penalty -σ : indel penalty
相关文档
最新文档