SlideShare une entreprise Scribd logo
1  sur  28
Knuth-Morris-Pratt Substring
Search Algorithm
Prepared By:
Sabiya Fatima
Email ID: sabiya1990fatima@gmail.com
1
Outline
 Definition
 History
 Components of KMP
 Algorithm
 Example
 Run-Time Analysis
 Complexity comparison of String Matching Algorithms
 Advantages and Disadvantages
 Real Time Applications
 References
2
What is Pattern Searching ?
 Suppose you are reading a text document.
 You want to search for a word.
 You click CTRL + F and search for that word.
 The word processor scans the document and shows the position of
occurrence.
What exactly happens is that, word i.e. pattern is searched inside the
text document.
3
Definition
 Best known for linear time for exact matching. Compares from left to right.
 Shifts more than one position.
 Preprocessing approach of Pattern to avoid trivial comparisions.
 Avoids recomputing matches.
4
History
 This algorithm was conceived by Donald Knuth and Vaughan Pratt and independently by
James H.Morris in 1977.
 Knuth, Morris and Pratt discovered first linear time string-matching algorithm by
analysis of the naive algorithm.
 It keeps the information that naive approach wasted gathered during the scan of the
text. By avoiding this waste of information, it achieves a running time of O(m + n).
 The implementation of Knuth-Morris-Pratt algorithm is efficient because it minimizes
the total number of comparisons of the pattern against the input string.
5
Naïve Approach
The naïve approach is to check whether the pattern matches the string at
every possible position in the string.
P= Pattern (word) of length m
T= Text (document) of length n
Naive string matching algorithm
takes time O((n-m+1)m) or
O(mn)
6
The KMP Algorithm - Motivation
x
j
. . a b a a b . . . . .
a b a a b a
a b a a b a
No need to
repeat these
comparisons
Resume
comparing
here
 Knuth-Morris-Pratt’s algorithm
compares the pattern to the text
in left-to-right, but shifts the
pattern more intelligently than
the brute-force algorithm.
 When a mismatch occurs, what
is the most we can shift the
pattern so as to avoid redundant
comparisons?
 Answer: the largest prefix of
P[0..j] that is a suffix of P[1..j]
7
Components of KMP algorithm
 The prefix function, Π
The prefix function,Π for a pattern encapsulates knowledge about how the pattern
matches against shifts of itself. This information can be used to avoid useless shifts of
the pattern ‘p’. In other words, this enables avoiding backtracking on the text ‘T’.
 The KMP Matcher
With text ‘T’, pattern ‘p’ and prefix function ‘Π’ as inputs, finds the occurrence of ‘p’ in
‘T’ and returns the number of shifts of ‘p’ after which occurrence is found.
8
The prefix function, Π
Following pseudocode computes the prefix function, Π:
Compute-Prefix-Function (p)
1 m  length[p] //’p’ pattern to be matched
2 Π[1]  0
3 k  0
4 for q  2 to m
5 do while k > 0 and p[k+1] != p[q]
6 do k  Π[k]
7 If p[k+1] = p[q]
8 then k  k +1
9 Π[q]  k
10 return Π
9
Example: compute Π for the pattern ‘p’ below:
a b a b a c a
Initially: m = length[p] = 7
Π[1] = 0
k = 0
Step 1: q = 2, k=0
Π[2] = 0
Step 2: q = 3, k = 0,
Π[3] = 1
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1
10
p
10
Contd…
11
Step 3: q = 4, k = 1
Π[4] = 2 q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1 2
Step 4: q = 5, k =2
Π[5] = 3
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1 2 3
Step 5: q = 6, k = 3
Π[6] = 0
Step 6: q = 7, k = 0
Π[7] = 1
After iterating 6 times, the prefix function
computation is complete: 
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1 2 3 0
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1 2 3 0 1
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1 2 3 0 1
The running time of the prefix function is O(m).
12Contd…..
The KMP Matcher
Input: The KMP Matcher, with pattern ‘p’, text ‘T’and prefix function ‘Π’, finds a match of p in T.
Following pseudocode computes the matching component of KMP algorithm:
KMP-Matcher(T,p)
1 n  length[T]
2 m  length[p]
3 Π  Compute-Prefix-Function(p)
4 q  0 //number of characters matched
5 for i  1 to n //scan T from left to right
6 do while q > 0 and p[q+1] != T[i]
7 do q  Π[q] //next character does not match
8 if p[q+1] = T[i]
9 then q  q + 1 //next character matches
10 if q = m //is all of p matched?
11 then print “Pattern occurs with shift” i – m
12 q  Π[ q] // look for the next match
Note: KMP finds every occurrence of a ‘p’in ‘T’. That is why KMP does not terminate in step 12, rather it searches
remainder of ‘T’for any more occurrences of ‘p’.
13
Illustration: given a Text ‘T’ and pattern ‘p’ as follows:
T
b a c b a b a b a b a c a c a
p a b a b a c a
Let us execute the KMP algorithm to find whether ‘p’ occurs in ‘T’.
For ‘p’the prefix function, Π was computed previously and is as follows:
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1 2 3 0 1 14
14
b a c b a b a b a b a c a a b
a b a b a c a
Initially: n = size of T = 15;
m = size of p = 7
Step 1: i = 1, q = 0
comparing p[1] with T[1]
T
p
P[1] does not match with T[1]. ‘p’ will be shifted one position to the right.
15
15
Contd…
Step 2: i = 2, q = 0
comparing p[1] with T[2]
T
p
a b a b a c a
P[1] matches T[2]. Since there is a match, p is not shifted.
b a c b a b a b a b a c a a b
Contd…
16
T b a c b a b a b a b a c a a b
a b a b a c ap
p[2] does not match with T[3]
Backtracking on p, comparing p[1] and T[3]
Step 3: i = 3, q = 1
Comparing p[2] with T[3]
a b a b a c a
T
p
Step 4: i = 4, q = 0 comparing p[1] with T[4] p[1] does not match with T[4]
b a c b a b a b a b a c a a b
b a c b a b a b a b a c a a b
a b a b a c a
T
p
Step 5: i = 5, q = 0
comparing p[1] with T[5] p[1] matches with T[5]
17
17
Step 6: i = 6, q = 1 Comparing p[2] with T[6] p[2] matches with T[6]
T
p
b a c b a b a b a b a c a a b
a b a b a c a
Contd…
b a c b a b a b a b a c a a b
b a c b a b a b a b a c a a b
a b a b a c a
a b a b a c a
T
p
Step 7: i = 7, q = 2 Comparing p[3] with T[7]
p[3] matches with T[7]
Step 8: i = 8, q = 3 Comparing p[4] with T[8]
p[4] matches with T[8]
T
p
18
18
Contd…
Step 9: i = 9, q = 4
Comparing p[5] with T[9]
Comparing p[6] with T[10]Step 10: i = 10, q = 5
T
p
b a c b a b a b a b a c a a b
b a c b a b a b a b a c a a b
a b a b a c a
a b a b a c a
p[6] doesn’t match with T[10]
Backtracking on p, comparing p[4] with T[10] because after mismatch q = Π[5] = 3
p[5] matches with T[9]
19
19
T
p
Contd…
20
Step 11: i = 11, q = 4
Comparing p[5] with T[11] p[5] matches with T[11]
T
p
b a c b a b a b a b a c a a b
a b a b a c a
Contd…
Step 12: i = 12, q = 5 Comparing p[6] with T[12] p[6] matches with T[12]
a b a b a c ap
b a c b a b a b a b a c a a bT
b a c b a b a b a b a c a a b
a b a b a c a
Comparing p[7] with T[13]
T
p
Step 13: i = 13, q = 6 p[7] matches with T[13]
Pattern ‘p’ has been found to completely occur in text ‘T’. The total number of shifts
that took place for the match to be found are: i – m = 13 – 7 = 6 shifts.
The running time of the KMP-Matcher function is O(n).
21
21
Contd…
Complexity
 O(m) - It is to compute the prefix function values.
 O(n) - It is to compare the pattern to the text.
 Total of O(n + m) run time.
22
Complexity comparison of String Matching
Algorithms
23
Advantage and Disadvantage
Advantages:
1.The running time of the KMP algorithm is optimal (O(m + n)), which is very fast.
2.The algorithm never needs to move backwards in the input text T. It makes the
algorithm good for processing very large files.
Disadvantages:
Doesn’t work so well as the size of the alphabets increases. By which more chances
of mismatch occurs.
24
Real time Applications
 Good for plagiarism analysis.
 search engines
 language syntax checker
 database queries
 music content retrieval
25
Real time Applications
26
 DNA sequences analysis :
• It is mainly composed of nucleotides of four types. The four bases in DNA are
Adenine (A), Cytosine (C), Guanine (G), and Thymine (T). A DNA sequence is a
representation of a string of nucleotides contained in a strand of DNA.
• DNA sequences analysis of various diseases which are stored in database for retrieval
and comparison. This system compares similarity values with threshold value and
stores particular result which is diseased or not.
• For example: ATTCGTAACTAGTAAGTTA. The DNA sequencing techniques have
allowed the vast amount of data to be analyzed in a short span of time. So, pattern
matching techniques plays a vital role in computational biology for data analysis
related to biological data such as DNA sequences
References
 Thomas H.Cormen; Charles E.Leiserson., Introduction to algorithms second edition , “The Knuth-Morris-Pratt
Algorithm”, year = 2001.
 https://pdfs.semanticscholar.org/fe41/52465f96d09c94b46b86a3b6408dae5dbe13.pdf
 http://research.ijcaonline.org/volume115/number23/pxc3902734.pdf
27
Thank You
28

Contenu connexe

Tendances

String Matching Algorithms-The Naive Algorithm
String Matching Algorithms-The Naive AlgorithmString Matching Algorithms-The Naive Algorithm
String Matching Algorithms-The Naive AlgorithmAdeel Rasheed
 
Rabin karp string matching algorithm
Rabin karp string matching algorithmRabin karp string matching algorithm
Rabin karp string matching algorithmGajanand Sharma
 
Formal Languages and Automata Theory unit 3
Formal Languages and Automata Theory unit 3Formal Languages and Automata Theory unit 3
Formal Languages and Automata Theory unit 3Srimatre K
 
Dinive conquer algorithm
Dinive conquer algorithmDinive conquer algorithm
Dinive conquer algorithmMohd Arif
 
Algorithm chapter 10
Algorithm chapter 10Algorithm chapter 10
Algorithm chapter 10chidabdu
 
Boyer moore algorithm
Boyer moore algorithmBoyer moore algorithm
Boyer moore algorithmAYESHA JAVED
 
Asymptotic notations
Asymptotic notationsAsymptotic notations
Asymptotic notationsNikhil Sharma
 
context free language
context free languagecontext free language
context free languagekhush_boo31
 
Push Down Automata (PDA) | TOC (Theory of Computation) | NPDA | DPDA
Push Down Automata (PDA) | TOC  (Theory of Computation) | NPDA | DPDAPush Down Automata (PDA) | TOC  (Theory of Computation) | NPDA | DPDA
Push Down Automata (PDA) | TOC (Theory of Computation) | NPDA | DPDAAshish Duggal
 

Tendances (20)

KMP String Matching Algorithm
KMP String Matching AlgorithmKMP String Matching Algorithm
KMP String Matching Algorithm
 
String Matching Algorithms-The Naive Algorithm
String Matching Algorithms-The Naive AlgorithmString Matching Algorithms-The Naive Algorithm
String Matching Algorithms-The Naive Algorithm
 
Rabin karp string matching algorithm
Rabin karp string matching algorithmRabin karp string matching algorithm
Rabin karp string matching algorithm
 
Rabin karp string matcher
Rabin karp string matcherRabin karp string matcher
Rabin karp string matcher
 
Formal Languages and Automata Theory unit 3
Formal Languages and Automata Theory unit 3Formal Languages and Automata Theory unit 3
Formal Languages and Automata Theory unit 3
 
Turing machine by_deep
Turing machine by_deepTuring machine by_deep
Turing machine by_deep
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
Push down automata
Push down automataPush down automata
Push down automata
 
Rabin Karp ppt
Rabin Karp pptRabin Karp ppt
Rabin Karp ppt
 
Dinive conquer algorithm
Dinive conquer algorithmDinive conquer algorithm
Dinive conquer algorithm
 
Daa unit 5
Daa unit 5Daa unit 5
Daa unit 5
 
Algorithm chapter 10
Algorithm chapter 10Algorithm chapter 10
Algorithm chapter 10
 
Boyer moore algorithm
Boyer moore algorithmBoyer moore algorithm
Boyer moore algorithm
 
convex hull
convex hullconvex hull
convex hull
 
Asymptotic notations
Asymptotic notationsAsymptotic notations
Asymptotic notations
 
Boyer more algorithm
Boyer more algorithmBoyer more algorithm
Boyer more algorithm
 
Finite Automata
Finite AutomataFinite Automata
Finite Automata
 
context free language
context free languagecontext free language
context free language
 
Push Down Automata (PDA) | TOC (Theory of Computation) | NPDA | DPDA
Push Down Automata (PDA) | TOC  (Theory of Computation) | NPDA | DPDAPush Down Automata (PDA) | TOC  (Theory of Computation) | NPDA | DPDA
Push Down Automata (PDA) | TOC (Theory of Computation) | NPDA | DPDA
 
Asymptotic notation
Asymptotic notationAsymptotic notation
Asymptotic notation
 

Similaire à Knuth morris pratt string matching algo

module6_stringmatchingalgorithm_2022.pdf
module6_stringmatchingalgorithm_2022.pdfmodule6_stringmatchingalgorithm_2022.pdf
module6_stringmatchingalgorithm_2022.pdfShiwani Gupta
 
String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)Aditya pratap Singh
 
W9Presentation.ppt
W9Presentation.pptW9Presentation.ppt
W9Presentation.pptAlinaMishra7
 
Gp 27[string matching].pptx
Gp 27[string matching].pptxGp 27[string matching].pptx
Gp 27[string matching].pptxSumitYadav641839
 
String-Matching Algorithms Advance algorithm
String-Matching  Algorithms Advance algorithmString-Matching  Algorithms Advance algorithm
String-Matching Algorithms Advance algorithmssuseraf60311
 
Chpt9 patternmatching
Chpt9 patternmatchingChpt9 patternmatching
Chpt9 patternmatchingdbhanumahesh
 
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnPatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnRAtna29
 
Knutt Morris Pratt Algorithm by Dr. Rose.ppt
Knutt Morris Pratt Algorithm by Dr. Rose.pptKnutt Morris Pratt Algorithm by Dr. Rose.ppt
Knutt Morris Pratt Algorithm by Dr. Rose.pptsaki931
 
String searching
String searching String searching
String searching thinkphp
 
Continuous Systems To Discrete Event Systems
Continuous Systems To Discrete Event SystemsContinuous Systems To Discrete Event Systems
Continuous Systems To Discrete Event Systemsahmad bassiouny
 
StringMatching-Rabikarp algorithmddd.pdf
StringMatching-Rabikarp algorithmddd.pdfStringMatching-Rabikarp algorithmddd.pdf
StringMatching-Rabikarp algorithmddd.pdfbhagabatijenadukura
 
Ning_Mei.ASSIGN03
Ning_Mei.ASSIGN03Ning_Mei.ASSIGN03
Ning_Mei.ASSIGN03宁 梅
 

Similaire à Knuth morris pratt string matching algo (20)

Lec17
Lec17Lec17
Lec17
 
module6_stringmatchingalgorithm_2022.pdf
module6_stringmatchingalgorithm_2022.pdfmodule6_stringmatchingalgorithm_2022.pdf
module6_stringmatchingalgorithm_2022.pdf
 
lec17.ppt
lec17.pptlec17.ppt
lec17.ppt
 
String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)
 
W9Presentation.ppt
W9Presentation.pptW9Presentation.ppt
W9Presentation.ppt
 
Gp 27[string matching].pptx
Gp 27[string matching].pptxGp 27[string matching].pptx
Gp 27[string matching].pptx
 
String-Matching Algorithms Advance algorithm
String-Matching  Algorithms Advance algorithmString-Matching  Algorithms Advance algorithm
String-Matching Algorithms Advance algorithm
 
Daa chapter9
Daa chapter9Daa chapter9
Daa chapter9
 
Chpt9 patternmatching
Chpt9 patternmatchingChpt9 patternmatching
Chpt9 patternmatching
 
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM  IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
 
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnPatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
 
Knutt Morris Pratt Algorithm by Dr. Rose.ppt
Knutt Morris Pratt Algorithm by Dr. Rose.pptKnutt Morris Pratt Algorithm by Dr. Rose.ppt
Knutt Morris Pratt Algorithm by Dr. Rose.ppt
 
String searching
String searching String searching
String searching
 
Continuous Systems To Discrete Event Systems
Continuous Systems To Discrete Event SystemsContinuous Systems To Discrete Event Systems
Continuous Systems To Discrete Event Systems
 
Boyer more algorithm
Boyer more algorithmBoyer more algorithm
Boyer more algorithm
 
Chap09alg
Chap09algChap09alg
Chap09alg
 
Chap09alg
Chap09algChap09alg
Chap09alg
 
StringMatching-Rabikarp algorithmddd.pdf
StringMatching-Rabikarp algorithmddd.pdfStringMatching-Rabikarp algorithmddd.pdf
StringMatching-Rabikarp algorithmddd.pdf
 
Ning_Mei.ASSIGN03
Ning_Mei.ASSIGN03Ning_Mei.ASSIGN03
Ning_Mei.ASSIGN03
 
Algorithm Assignment Help
Algorithm Assignment HelpAlgorithm Assignment Help
Algorithm Assignment Help
 

Dernier

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Dernier (20)

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 

Knuth morris pratt string matching algo

  • 1. Knuth-Morris-Pratt Substring Search Algorithm Prepared By: Sabiya Fatima Email ID: sabiya1990fatima@gmail.com 1
  • 2. Outline  Definition  History  Components of KMP  Algorithm  Example  Run-Time Analysis  Complexity comparison of String Matching Algorithms  Advantages and Disadvantages  Real Time Applications  References 2
  • 3. What is Pattern Searching ?  Suppose you are reading a text document.  You want to search for a word.  You click CTRL + F and search for that word.  The word processor scans the document and shows the position of occurrence. What exactly happens is that, word i.e. pattern is searched inside the text document. 3
  • 4. Definition  Best known for linear time for exact matching. Compares from left to right.  Shifts more than one position.  Preprocessing approach of Pattern to avoid trivial comparisions.  Avoids recomputing matches. 4
  • 5. History  This algorithm was conceived by Donald Knuth and Vaughan Pratt and independently by James H.Morris in 1977.  Knuth, Morris and Pratt discovered first linear time string-matching algorithm by analysis of the naive algorithm.  It keeps the information that naive approach wasted gathered during the scan of the text. By avoiding this waste of information, it achieves a running time of O(m + n).  The implementation of Knuth-Morris-Pratt algorithm is efficient because it minimizes the total number of comparisons of the pattern against the input string. 5
  • 6. Naïve Approach The naïve approach is to check whether the pattern matches the string at every possible position in the string. P= Pattern (word) of length m T= Text (document) of length n Naive string matching algorithm takes time O((n-m+1)m) or O(mn) 6
  • 7. The KMP Algorithm - Motivation x j . . a b a a b . . . . . a b a a b a a b a a b a No need to repeat these comparisons Resume comparing here  Knuth-Morris-Pratt’s algorithm compares the pattern to the text in left-to-right, but shifts the pattern more intelligently than the brute-force algorithm.  When a mismatch occurs, what is the most we can shift the pattern so as to avoid redundant comparisons?  Answer: the largest prefix of P[0..j] that is a suffix of P[1..j] 7
  • 8. Components of KMP algorithm  The prefix function, Π The prefix function,Π for a pattern encapsulates knowledge about how the pattern matches against shifts of itself. This information can be used to avoid useless shifts of the pattern ‘p’. In other words, this enables avoiding backtracking on the text ‘T’.  The KMP Matcher With text ‘T’, pattern ‘p’ and prefix function ‘Π’ as inputs, finds the occurrence of ‘p’ in ‘T’ and returns the number of shifts of ‘p’ after which occurrence is found. 8
  • 9. The prefix function, Π Following pseudocode computes the prefix function, Π: Compute-Prefix-Function (p) 1 m  length[p] //’p’ pattern to be matched 2 Π[1]  0 3 k  0 4 for q  2 to m 5 do while k > 0 and p[k+1] != p[q] 6 do k  Π[k] 7 If p[k+1] = p[q] 8 then k  k +1 9 Π[q]  k 10 return Π 9
  • 10. Example: compute Π for the pattern ‘p’ below: a b a b a c a Initially: m = length[p] = 7 Π[1] = 0 k = 0 Step 1: q = 2, k=0 Π[2] = 0 Step 2: q = 3, k = 0, Π[3] = 1 q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 10 p 10
  • 11. Contd… 11 Step 3: q = 4, k = 1 Π[4] = 2 q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 2 Step 4: q = 5, k =2 Π[5] = 3 q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 2 3
  • 12. Step 5: q = 6, k = 3 Π[6] = 0 Step 6: q = 7, k = 0 Π[7] = 1 After iterating 6 times, the prefix function computation is complete:  q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 2 3 0 q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 2 3 0 1 q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 2 3 0 1 The running time of the prefix function is O(m). 12Contd…..
  • 13. The KMP Matcher Input: The KMP Matcher, with pattern ‘p’, text ‘T’and prefix function ‘Π’, finds a match of p in T. Following pseudocode computes the matching component of KMP algorithm: KMP-Matcher(T,p) 1 n  length[T] 2 m  length[p] 3 Π  Compute-Prefix-Function(p) 4 q  0 //number of characters matched 5 for i  1 to n //scan T from left to right 6 do while q > 0 and p[q+1] != T[i] 7 do q  Π[q] //next character does not match 8 if p[q+1] = T[i] 9 then q  q + 1 //next character matches 10 if q = m //is all of p matched? 11 then print “Pattern occurs with shift” i – m 12 q  Π[ q] // look for the next match Note: KMP finds every occurrence of a ‘p’in ‘T’. That is why KMP does not terminate in step 12, rather it searches remainder of ‘T’for any more occurrences of ‘p’. 13
  • 14. Illustration: given a Text ‘T’ and pattern ‘p’ as follows: T b a c b a b a b a b a c a c a p a b a b a c a Let us execute the KMP algorithm to find whether ‘p’ occurs in ‘T’. For ‘p’the prefix function, Π was computed previously and is as follows: q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 2 3 0 1 14 14
  • 15. b a c b a b a b a b a c a a b a b a b a c a Initially: n = size of T = 15; m = size of p = 7 Step 1: i = 1, q = 0 comparing p[1] with T[1] T p P[1] does not match with T[1]. ‘p’ will be shifted one position to the right. 15 15 Contd… Step 2: i = 2, q = 0 comparing p[1] with T[2] T p a b a b a c a P[1] matches T[2]. Since there is a match, p is not shifted. b a c b a b a b a b a c a a b
  • 16. Contd… 16 T b a c b a b a b a b a c a a b a b a b a c ap p[2] does not match with T[3] Backtracking on p, comparing p[1] and T[3] Step 3: i = 3, q = 1 Comparing p[2] with T[3] a b a b a c a T p Step 4: i = 4, q = 0 comparing p[1] with T[4] p[1] does not match with T[4] b a c b a b a b a b a c a a b
  • 17. b a c b a b a b a b a c a a b a b a b a c a T p Step 5: i = 5, q = 0 comparing p[1] with T[5] p[1] matches with T[5] 17 17 Step 6: i = 6, q = 1 Comparing p[2] with T[6] p[2] matches with T[6] T p b a c b a b a b a b a c a a b a b a b a c a Contd…
  • 18. b a c b a b a b a b a c a a b b a c b a b a b a b a c a a b a b a b a c a a b a b a c a T p Step 7: i = 7, q = 2 Comparing p[3] with T[7] p[3] matches with T[7] Step 8: i = 8, q = 3 Comparing p[4] with T[8] p[4] matches with T[8] T p 18 18 Contd…
  • 19. Step 9: i = 9, q = 4 Comparing p[5] with T[9] Comparing p[6] with T[10]Step 10: i = 10, q = 5 T p b a c b a b a b a b a c a a b b a c b a b a b a b a c a a b a b a b a c a a b a b a c a p[6] doesn’t match with T[10] Backtracking on p, comparing p[4] with T[10] because after mismatch q = Π[5] = 3 p[5] matches with T[9] 19 19 T p Contd…
  • 20. 20 Step 11: i = 11, q = 4 Comparing p[5] with T[11] p[5] matches with T[11] T p b a c b a b a b a b a c a a b a b a b a c a Contd… Step 12: i = 12, q = 5 Comparing p[6] with T[12] p[6] matches with T[12] a b a b a c ap b a c b a b a b a b a c a a bT
  • 21. b a c b a b a b a b a c a a b a b a b a c a Comparing p[7] with T[13] T p Step 13: i = 13, q = 6 p[7] matches with T[13] Pattern ‘p’ has been found to completely occur in text ‘T’. The total number of shifts that took place for the match to be found are: i – m = 13 – 7 = 6 shifts. The running time of the KMP-Matcher function is O(n). 21 21 Contd…
  • 22. Complexity  O(m) - It is to compute the prefix function values.  O(n) - It is to compare the pattern to the text.  Total of O(n + m) run time. 22
  • 23. Complexity comparison of String Matching Algorithms 23
  • 24. Advantage and Disadvantage Advantages: 1.The running time of the KMP algorithm is optimal (O(m + n)), which is very fast. 2.The algorithm never needs to move backwards in the input text T. It makes the algorithm good for processing very large files. Disadvantages: Doesn’t work so well as the size of the alphabets increases. By which more chances of mismatch occurs. 24
  • 25. Real time Applications  Good for plagiarism analysis.  search engines  language syntax checker  database queries  music content retrieval 25
  • 26. Real time Applications 26  DNA sequences analysis : • It is mainly composed of nucleotides of four types. The four bases in DNA are Adenine (A), Cytosine (C), Guanine (G), and Thymine (T). A DNA sequence is a representation of a string of nucleotides contained in a strand of DNA. • DNA sequences analysis of various diseases which are stored in database for retrieval and comparison. This system compares similarity values with threshold value and stores particular result which is diseased or not. • For example: ATTCGTAACTAGTAAGTTA. The DNA sequencing techniques have allowed the vast amount of data to be analyzed in a short span of time. So, pattern matching techniques plays a vital role in computational biology for data analysis related to biological data such as DNA sequences
  • 27. References  Thomas H.Cormen; Charles E.Leiserson., Introduction to algorithms second edition , “The Knuth-Morris-Pratt Algorithm”, year = 2001.  https://pdfs.semanticscholar.org/fe41/52465f96d09c94b46b86a3b6408dae5dbe13.pdf  http://research.ijcaonline.org/volume115/number23/pxc3902734.pdf 27