Suffix Array Implementation in Java for Substring Search
A suffix array is a data structure that provides an efficient way to search for substrings within a string. It consists of an array of integers that represents the starting positions of the suffixes of a string, sorted in lexicographical order. Using a suffix array, substring search operations can be performed efficiently using binary search.
Program Structure
The implementation of a suffix array for substring search involves two main components:
- SuffixArray: Builds the suffix array from a given string and provides methods for searching substrings.
- Suffix Array Construction: A helper function to build the suffix array.
SuffixArray Class
This class represents the main structure and operations of the suffix array. It contains:
- An integer array
suffixArr[]
that stores the suffix array. - A string
text
representing the input string. - Methods for constructing the suffix array, performing binary search for a substring, and searching for all occurrences of a substring.
Java Code Implementation
SuffixArray Class
import java.util.Arrays;
public class SuffixArray {
private int[] suffixArr;
private String text;
public SuffixArray(String text) {
this.text = text;
this.suffixArr = buildSuffixArray(text);
}
private int[] buildSuffixArray(String text) {
int n = text.length();
Integer[] suffixes = new Integer[n];
for (int i = 0; i < n; i++) {
suffixes[i] = i;
}
Arrays.sort(suffixes, (a, b) -> text.substring(a).compareTo(text.substring(b)));
int[] suffixArr = new int[n];
for (int i = 0; i < n; i++) {
suffixArr[i] = suffixes[i];
}
return suffixArr;
}
public boolean search(String pat) {
int left = 0, right = suffixArr.length – 1;
int m = pat.length();
while (left <= right) {
int mid = left + (right – left) / 2;
String suffix = text.substring(suffixArr[mid], Math.min(suffixArr[mid] + m, text.length()));
int cmp = pat.compareTo(suffix);
if (cmp == 0) {
return true;
}
if (cmp < 0) {
right = mid – 1;
} else {
left = mid + 1;
}
}
return false;
}
public void printSuffixArray() {
for (int i : suffixArr) {
System.out.println(i + “: ” + text.substring(i));
}
}
}
Explanation
Here is a detailed explanation of the key components in the SuffixArray implementation:
1. Constructor and Fields
The SuffixArray
class has two main fields: an integer array suffixArr[]
that stores the suffix array, and a string text
representing the input string. The constructor initializes these fields and calls the buildSuffixArray
method to construct the suffix array.
2. buildSuffixArray Method
The buildSuffixArray
method constructs the suffix array for the given string. It starts by creating an array of integers, where each element represents the starting index of a suffix in the string. The suffixes are then sorted lexicographically using Java’s Arrays.sort()
method with a custom comparator that compares suffixes based on their substring values. The result is stored in the suffixArr
array.
3. search Method
The search
method performs a binary search on the suffix array to determine if a given pattern pat
exists in the string. The method iterates over the suffix array, comparing the pattern with the corresponding suffix substring. If a match is found, the method returns true
. If no match is found after completing the binary search, the method returns false
.
4. printSuffixArray Method
The printSuffixArray
method prints the contents of the suffix array, showing the starting index of each suffix and the corresponding substring in the original text. This method is useful for debugging and understanding the structure of the suffix array.
Conclusion
This Java implementation of a suffix array provides an efficient way to search for substrings within a given string. The suffix array allows for fast searching using binary search, making it ideal for applications where substring queries are frequent. The implementation demonstrates the construction of a suffix array, as well as searching for patterns within the text.