KMP Skip Search算法详解

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Output: Find all occurrence of P in T.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 T: c a b c d a b a b d a d a d c d a b c d
源自文库
P:
d a b a b d a d ad
wall = 19 kmpstart = 19 skipstart = 16
4
A general situation for the search phase
i
T
j
P
start
wall
i
T
k
X
j
P
k
First it uses skip search algorithm which makes T[i]=P[j]. wall is the first mismatch position of T when T align with P. start is the first position of T when T align with P. k is a small string when the substring of P equal to the substring of T. KmpStart is the next shift position of kmp. Skipstart is the next shift position of skip.
KMP Skip Search Algorithm
Very Fast String Matching Algorithm for Small Alphabets and Long Patterns, Christian, C., Thierry, L. and Joseph, D.P., Lecture Notes in
7
Example: step 1-1
start = 0 wall = 5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
8
Example: step 1-2
start = 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 12 3 4 5 6 7 Pattern string P=GCAGAGAG
c ACGT Z[c] 6 1 7 -1
P=GCAGAGAG 01 234 5 6 7
i 0 1 2 3 45 67
List [i]
-1
-1
-1
0
2
345
0 1 2 3 4 5 6 78 mpNext -1 0 0 0 1 0 1 0 1 kmpNext -1 0 0 -1 1 -1 1 -1 1
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
k=1
0 1 2 3 4 56
ACTACGT
0 1 2 3 4 56
ACTACGT
(kmp’s shift) kmpstart = 10
0 1 2 3 4 56
ACTACGT (skip’s shift) skipstart = 12
6
Example: step 1
First it uses the Skip Search algorithm to align T and P.
start = 0 wall = 5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
Computer Science, Vol. 1448, 1998, pp. 55-64
Advisor: Prof. R. C. T. Lee Speaker: Z. H. Pan
1
Definition
• String Matching Problem:
Input: a text string T of length n and a pattern string P of length m.
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 56
ACTACGT
10
Example: step 3
start = 12
wall = 19
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
The occurrences of P in T : T5
2
• The KMP Skip Search algorithm consists two phases which are processing and searching.
• KMP Skip Search algorithm uses KMP table to improve the Skip Search algorithm.
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 56
P = ACTACGT
k=5
0 1 2 3 4 56
ACTACGT (kmp’s shift) kmpstart = 3
0 1 2 3 4 56
ACTACGT (skip’s shift) skipstart = 4
0 1 2 3 4 56
ACTACGT
k=2
0 1 2 3 4 56
ACTACGT (kmp’s shift) kmpstart = 5
0 1 2 3 4 56
ACTACGT (skip’s shift) skipstart = 4
wall = 5 kmpstart = 5 skipstart = 4
3
Preprocessing
• The preprocessing phase computes the buckets for all characters of the alphabet , list table , MP table and KMP table.
Example:
Text string T=GCATCGCAGAGAGTATACAGTACG
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 56
ACTACGT
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 56
ACTACGT
match, k=7
0 1 2 3 4 56
(kmp’s shift) kmpstart = 19
ACTACGT
0 1 2 3 4 56
(skip’s shift) skipstart = 16 ACTACGT
Case1. skipStart < kmpStart then a shift according to the skip algorithm is applied which gives a new value for skipStart, and we have to compare again skipStart and kmpStart.
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 56
ACTACGT
9
Example: step 2
start = 9 wall = 10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
wall = 10 Case4. kmpStart < wall < skipStart kmpstart = 10 then another attempt can be performed with start = skipStart.
skipstart = 12
start = 12
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
Case3. skipStart = kmpStart then another step can be performed with start = skipStart.
Case4. kmpStart < wall < skipStart
then another step can be performed with start = skipStart.
wall = 5 kmpstart = 3 skipstart = 4
Case2. kmpStart < skipStart < wall then a shift according to the shift table of Morris-Pratt is applied. This gives a new value for kmpStart. We have to compare again skipStart and kmpStart.
0 1 2 3 4 56
k = 0 ACTACGT
∴ uses skip search algorithm
0 1 2 3 4 56
ACTACGT
start = 9
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
Case1. skipStart < kmpStart then a shift according to the skip algorithm is applied which gives a new value for skipStart, and we have to compare again skipStart and kmpStart.
5
If k=0, that there is not the prefix of P which equals the substring of T, it uses skip search algorithm; otherwise, when k>0, that there is not the prefix of P which equals the substring of T, we have to find out Kmpstart、wall and Skipstart to compare its four cases.
Case2. kmpStart < skipStart < wall then a shift according to the shift table of Morris-Pratt is applied. This gives a new value for kmpStart. We have to compare again skipStart and kmpStart.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
T = ACTACATATAGGACTACGTACCAGCATTACTACGTT
0 1 2 3 4 56
ACTACGT
相关文档
最新文档