Find the Index of the First Occurrence in a String

LeetCode problem 28, Find the Index of the First Occurrence in a String, asks us to locate the first index of a given substring (needle) in another string (haystack). If the substring is not found, we return -1. This is a classic problem in string searching with multiple approaches ranging from basic brute force to advanced pattern-matching algorithms.

In this blog post, we will explore all the available approaches, from naive to optimal, with Java implementations.

Example 1:

Input: haystack = "sadbutsad", needle = "sad"
Output: 0
Explanation: "sad" occurs at index 0 and 6.
The first occurrence is at index 0, so we return 0.

Example 2:

Input: haystack = "leetcode", needle = "leeto"
Output: -1
Explanation: "leeto" did not occur in "leetcode", so we return -1.

Constraints:

  • 1 <= haystack.length, needle.length <= 104
  • haystack and needle consist of only lowercase English characters.

Approach 1: Naive String Matching

The naive approach involves checking each substring of haystack with the same length as needle to see if it matches needle.

Algorithm

  1. Iterate through the haystack from index 0 to haystack.length - needle.length.
  2. For each index, compare the substring of length equal to needle with needle.
  3. Return the index if a match is found; otherwise, return -1.

Code:

public class FindFirstOccurrence {
    public static int strStr(String haystack, String needle) {
        int n = haystack.length();
        int m = needle.length();

        for (int i = 0; i <= n - m; i++) {
            if (haystack.substring(i, i + m).equals(needle)) {
                return i;
            }
        }
        return -1;
    }

    public static void main(String[] args) {
        System.out.println(strStr("sadbutsad", "sad")); // Output: 0
        System.out.println(strStr("leetcode", "leeto")); // Output: -1
    }
}

Time Complexity:

  • Worst case: O((n−m+1)×m), where n is the length of haystack and m is the length of needle.

Space Complexity:

  • O(m): For storing the substring.

Approach 2: Two Pointers

Instead of extracting substrings, we can use two pointers to compare characters in haystack and needle.

Algorithm

  1. Use a loop to traverse the haystack from index 0 to n - m.
  2. For each position, check character-by-character whether needle matches starting at that index.
  3. Return the index if all characters match; otherwise, continue.

Code:

public class FindFirstOccurrence {
    public static int strStr(String haystack, String needle) {
        int n = haystack.length();
        int m = needle.length();

        for (int i = 0; i <= n - m; i++) {
            int j = 0;
            while (j < m && haystack.charAt(i + j) == needle.charAt(j)) {
                j++;
            }
            if (j == m) {
                return i;
            }
        }
        return -1;
    }

    public static void main(String[] args) {
        System.out.println(strStr("sadbutsad", "sad")); // Output: 0
        System.out.println(strStr("leetcode", "leeto")); // Output: -1
    }
}

Time Complexity:

  • Worst case: O(n×m).

Space Complexity:

  • O(1): In-place comparison without additional storage.

Approach 3: Using Built-In Functions

Java provides built-in methods like String.indexOf() that solve this problem directly.

Code:

public class FindFirstOccurrence {
    public static int strStr(String haystack, String needle) {
        return haystack.indexOf(needle);
    }

    public static void main(String[] args) {
        System.out.println(strStr("sadbutsad", "sad")); // Output: 0
        System.out.println(strStr("leetcode", "leeto")); // Output: -1
    }
}

Time Complexity:

  • Dependent on the implementation of indexOf(), but it is optimized and typically O(n+m).

Space Complexity:

  • O(1).

Approach 4: Knuth-Morris-Pratt (KMP) Algorithm

The KMP algorithm improves efficiency by preprocessing the needle string to create a “partial match” table (LPS array). This table determines the next position to check in needle without rechecking characters that are already matched.

Algorithm

  1. Construct the LPS (Longest Prefix Suffix) array for the needle.
  2. Use the LPS array to skip unnecessary comparisons in the haystack.

Code:

public class FindFirstOccurrence {
    public static int strStr(String haystack, String needle) {
        int n = haystack.length();
        int m = needle.length();

        if (m == 0) return 0;

        // Create LPS array
        int[] lps = buildLPS(needle);

        int i = 0; // pointer for haystack
        int j = 0; // pointer for needle

        while (i < n) {
            if (haystack.charAt(i) == needle.charAt(j)) {
                i++;
                j++;
                if (j == m) return i - j;
            } else {
                if (j > 0) {
                    j = lps[j - 1];
                } else {
                    i++;
                }
            }
        }
        return -1;
    }

    private static int[] buildLPS(String needle) {
        int m = needle.length();
        int[] lps = new int[m];
        int len = 0;
        int i = 1;

        while (i < m) {
            if (needle.charAt(i) == needle.charAt(len)) {
                len++;
                lps[i] = len;
                i++;
            } else {
                if (len > 0) {
                    len = lps[len - 1];
                } else {
                    lps[i] = 0;
                    i++;
                }
            }
        }
        return lps;
    }

    public static void main(String[] args) {
        System.out.println(strStr("sadbutsad", "sad")); // Output: 0
        System.out.println(strStr("leetcode", "leeto")); // Output: -1
    }
}

Time Complexity:

  • O(n+m): Efficient due to the LPS array preprocessing.

Space Complexity:

  • O(m): For the LPS array.

Conclusion

We discussed multiple approaches to solving the problem:

  1. Naive String Matching: Easy to implement but inefficient for large inputs.
  2. Two Pointers: Avoids substring creation but still has a higher time complexity.
  3. Built-In Functions: Simplest and optimized for practical use.
  4. KMP Algorithm: Best choice for optimal performance, especially for long strings.

Recommendation

If you are practicing for interviews, try implementing both the naive and KMP approaches to understand their trade-offs. For real-world scenarios, built-in functions are usually sufficient.

Which approach do you find most interesting? Let me know in the comments!

Leave a comment