KMP Algorithm

KMP Algorithm

Posted by Leonard Meng on January 15, 2024

PerFace

The key of the hard problem in the leetcode weekly contest this week is KMP algorithm. I have learned this algorithm before, but I still need to review it. So I write this blog to review this algorithm.

Introduction

String matching algorithms are used to find a pattern in a string. The simplest string matching algorithm is the Brute-Force algorithm. It is also known as the Naive String Matching algorithm. It is the simplest algorithm and is easy to understand and implement. But it is not the most efficient algorithm. It is not suitable for large texts. The time complexity of the Brute-Force algorithm is O(m*n), where m is the length of the pattern and n is the length of the text.The following figure shows the working of the Brute-Force algorithm.

kmp1

KMP Algorithm

KMP algorithm is an efficient string matching algorithm. It is based on the next array that when a mismatch occurs, the pattern itself contains enough information to determine where the next match could begin, thus bypassing re-examination of previously matched characters. The time complexity of the KMP algorithm is O(m+n), where m is the length of the pattern and n is the length of the text. The following figure shows the working of the KMP algorithm.

kmp1

Given i is the pointer of the text, j is the pointer of the pattern. To briefly describe the idea of KMP algorithm, we want to reduce the time of pointer j callback to 0. And the pointer i will not go back. The steps of j pointer callback are stored in the next array. The next array is the key of KMP algorithm. The next array is the longest prefix and suffix array of the pattern. The following figure shows the next array of the pattern “ABACABAB”.

How to build the next array

Let’s take a look at the next aray of the pattern “ABACABAB”. According to KMP algorithm, the next array is the longest prefix and suffix array of the pattern (We want to match string from beginning, so that common prefix and suffix means the substring in text scaned by pointer i (this is suffix) have already matched the pattern prefix). Following is the next array generation process.

kmp1

The process begins with initializing a pointer ‘j’, which also represents the length of the current common prefix and suffix. Initially, the algorithm checks if pat.charAt(0) == pat.charAt(j). If they are equal, ‘j’ is incremented, and next[j] is set to ‘j’. If pat.charAt(0) != pat.charAt(j)’, the algorithm looks for a shorter common prefix and suffix. This is done by referring back to the ‘next’ array. For example, if the pointer ‘j’ is at position 6, we need to check if PAT[3]equalsPAT[7]. If this comparison fails, the algorithm continues to search for a shorter matching prefix and suffix using the ‘next’ array, checking if PAT[PAT[3]] == PAT[7], and so on, until it finds PAT[X] == PAT[7]orPAT[X] == 0`. This process repeats until the ‘next’ array is completely built.

Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
public class KMPSearching {

    public static List<Integer> KMPSearch(String s, String sub) {
        List<Integer> result = new ArrayList<>();

        int[] next = buildNext(sub);
        int i = 0;
        int j = 0;

        while(i < s.length()) {
            if(s.charAt(i) == sub.charAt(j)) {
                i++;
                j++;
            } else if (j > 0){
                j = next[j - 1];
            } else {
                i++;
            }

            if(j == sub.length()) {
                result.add(i - j);
                j = next[j - 1];
            }
        }

        return result;
    }

    // Function to compute the longest prefix suffix array
    public static int[] buildNext(String sub) {
        int[] next = new int[sub.length()];
        int commonFixLength = 0;
        int i = 1;
        while(i < sub.length()) {
            if(sub.charAt(i) == sub.charAt(commonFixLength)) {
                commonFixLength++;
                next[i] = commonFixLength;
                i++;
            } else {
                if(commonFixLength != 0) {
                    commonFixLength = next[commonFixLength - 1];
                } else {
                    next[i] = 0;
                    i++;
                }
            }
        }

        return next;
    }


    public static void main(String args[]) {
        String s = "ABABDABACDABABCABABCABAB";
        String sub = "ABABCABAB";
        List<Integer> matchedIndices = KMPSearch(s, sub);

        System.out.println("Pattern found at indices: " + matchedIndices);
    }

}

Referencxe List

[1] https://www.bilibili.com/video/BV1AY4y157yL

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Welcome to reprint, and please indicate the source: Lingjun Meng's Blog Keep the content of the article complete and the above statement information!