LeetCode problem 28, “Find the Index of the First Occurrence in a String“, asks us to locate the first index of a given substring (needle
) in another string (haystack
). If the substring is not found, we return -1
. This is a classic problem in string searching with multiple approaches ranging from basic brute force to advanced pattern-matching algorithms.
In this blog post, we will explore all the available approaches, from naive to optimal, with Java implementations.
Example 1:
Input: haystack = "sadbutsad", needle = "sad" Output: 0 Explanation: "sad" occurs at index 0 and 6. The first occurrence is at index 0, so we return 0.
Example 2:
Input: haystack = "leetcode", needle = "leeto" Output: -1 Explanation: "leeto" did not occur in "leetcode", so we return -1.
Constraints:
1 <= haystack.length, needle.length <= 104
haystack
andneedle
consist of only lowercase English characters.
Approach 1: Naive String Matching
The naive approach involves checking each substring of haystack
with the same length as needle
to see if it matches needle
.
Algorithm
- Iterate through the
haystack
from index0
tohaystack.length - needle.length
. - For each index, compare the substring of length equal to
needle
withneedle
. - Return the index if a match is found; otherwise, return
-1
.
Code:
public class FindFirstOccurrence {
public static int strStr(String haystack, String needle) {
int n = haystack.length();
int m = needle.length();
for (int i = 0; i <= n - m; i++) {
if (haystack.substring(i, i + m).equals(needle)) {
return i;
}
}
return -1;
}
public static void main(String[] args) {
System.out.println(strStr("sadbutsad", "sad")); // Output: 0
System.out.println(strStr("leetcode", "leeto")); // Output: -1
}
}
Time Complexity:
- Worst case: O((n−m+1)×m), where n is the length of
haystack
and m is the length ofneedle
.
Space Complexity:
- O(m): For storing the substring.
Approach 2: Two Pointers
Instead of extracting substrings, we can use two pointers to compare characters in haystack
and needle
.
Algorithm
- Use a loop to traverse the
haystack
from index0
ton - m
. - For each position, check character-by-character whether
needle
matches starting at that index. - Return the index if all characters match; otherwise, continue.
Code:
public class FindFirstOccurrence {
public static int strStr(String haystack, String needle) {
int n = haystack.length();
int m = needle.length();
for (int i = 0; i <= n - m; i++) {
int j = 0;
while (j < m && haystack.charAt(i + j) == needle.charAt(j)) {
j++;
}
if (j == m) {
return i;
}
}
return -1;
}
public static void main(String[] args) {
System.out.println(strStr("sadbutsad", "sad")); // Output: 0
System.out.println(strStr("leetcode", "leeto")); // Output: -1
}
}
Time Complexity:
- Worst case: O(n×m).
Space Complexity:
- O(1): In-place comparison without additional storage.
Approach 3: Using Built-In Functions
Java provides built-in methods like String.indexOf()
that solve this problem directly.
Code:
public class FindFirstOccurrence {
public static int strStr(String haystack, String needle) {
return haystack.indexOf(needle);
}
public static void main(String[] args) {
System.out.println(strStr("sadbutsad", "sad")); // Output: 0
System.out.println(strStr("leetcode", "leeto")); // Output: -1
}
}
Time Complexity:
- Dependent on the implementation of
indexOf()
, but it is optimized and typically O(n+m).
Space Complexity:
- O(1).
Approach 4: Knuth-Morris-Pratt (KMP) Algorithm
The KMP algorithm improves efficiency by preprocessing the needle
string to create a “partial match” table (LPS array). This table determines the next position to check in needle
without rechecking characters that are already matched.
Algorithm
- Construct the LPS (Longest Prefix Suffix) array for the
needle
. - Use the LPS array to skip unnecessary comparisons in the
haystack
.
Code:
public class FindFirstOccurrence {
public static int strStr(String haystack, String needle) {
int n = haystack.length();
int m = needle.length();
if (m == 0) return 0;
// Create LPS array
int[] lps = buildLPS(needle);
int i = 0; // pointer for haystack
int j = 0; // pointer for needle
while (i < n) {
if (haystack.charAt(i) == needle.charAt(j)) {
i++;
j++;
if (j == m) return i - j;
} else {
if (j > 0) {
j = lps[j - 1];
} else {
i++;
}
}
}
return -1;
}
private static int[] buildLPS(String needle) {
int m = needle.length();
int[] lps = new int[m];
int len = 0;
int i = 1;
while (i < m) {
if (needle.charAt(i) == needle.charAt(len)) {
len++;
lps[i] = len;
i++;
} else {
if (len > 0) {
len = lps[len - 1];
} else {
lps[i] = 0;
i++;
}
}
}
return lps;
}
public static void main(String[] args) {
System.out.println(strStr("sadbutsad", "sad")); // Output: 0
System.out.println(strStr("leetcode", "leeto")); // Output: -1
}
}
Time Complexity:
- O(n+m): Efficient due to the LPS array preprocessing.
Space Complexity:
- O(m): For the LPS array.
Conclusion
We discussed multiple approaches to solving the problem:
- Naive String Matching: Easy to implement but inefficient for large inputs.
- Two Pointers: Avoids substring creation but still has a higher time complexity.
- Built-In Functions: Simplest and optimized for practical use.
- KMP Algorithm: Best choice for optimal performance, especially for long strings.
Recommendation
If you are practicing for interviews, try implementing both the naive and KMP approaches to understand their trade-offs. For real-world scenarios, built-in functions are usually sufficient.
Which approach do you find most interesting? Let me know in the comments!