Suffix Array Implementation in Java for Substring Search

A suffix array is a data structure that provides an efficient way to search for substrings within a string. It consists of an array of integers that represents the starting positions of the suffixes of a string, sorted in lexicographical order. Using a suffix array, substring search operations can be performed efficiently using binary search.

Program Structure

The implementation of a suffix array for substring search involves two main components:

  • SuffixArray: Builds the suffix array from a given string and provides methods for searching substrings.
  • Suffix Array Construction: A helper function to build the suffix array.

SuffixArray Class

This class represents the main structure and operations of the suffix array. It contains:

  • An integer array suffixArr[] that stores the suffix array.
  • A string text representing the input string.
  • Methods for constructing the suffix array, performing binary search for a substring, and searching for all occurrences of a substring.

Java Code Implementation

SuffixArray Class


import java.util.Arrays;

public class SuffixArray {
private int[] suffixArr;
private String text;

public SuffixArray(String text) {
this.text = text;
this.suffixArr = buildSuffixArray(text);
}

private int[] buildSuffixArray(String text) {
int n = text.length();
Integer[] suffixes = new Integer[n];

for (int i = 0; i < n; i++) {
suffixes[i] = i;
}

Arrays.sort(suffixes, (a, b) -> text.substring(a).compareTo(text.substring(b)));

int[] suffixArr = new int[n];
for (int i = 0; i < n; i++) {
suffixArr[i] = suffixes[i];
}
return suffixArr;
}

public boolean search(String pat) {
int left = 0, right = suffixArr.length – 1;
int m = pat.length();

while (left <= right) {
int mid = left + (right – left) / 2;
String suffix = text.substring(suffixArr[mid], Math.min(suffixArr[mid] + m, text.length()));

int cmp = pat.compareTo(suffix);
if (cmp == 0) {
return true;
}
if (cmp < 0) {
right = mid – 1;
} else {
left = mid + 1;
}
}
return false;
}

public void printSuffixArray() {
for (int i : suffixArr) {
System.out.println(i + “: ” + text.substring(i));
}
}
}

Explanation

Here is a detailed explanation of the key components in the SuffixArray implementation:

1. Constructor and Fields

The SuffixArray class has two main fields: an integer array suffixArr[] that stores the suffix array, and a string text representing the input string. The constructor initializes these fields and calls the buildSuffixArray method to construct the suffix array.

2. buildSuffixArray Method

The buildSuffixArray method constructs the suffix array for the given string. It starts by creating an array of integers, where each element represents the starting index of a suffix in the string. The suffixes are then sorted lexicographically using Java’s Arrays.sort() method with a custom comparator that compares suffixes based on their substring values. The result is stored in the suffixArr array.

3. search Method

The search method performs a binary search on the suffix array to determine if a given pattern pat exists in the string. The method iterates over the suffix array, comparing the pattern with the corresponding suffix substring. If a match is found, the method returns true. If no match is found after completing the binary search, the method returns false.

4. printSuffixArray Method

The printSuffixArray method prints the contents of the suffix array, showing the starting index of each suffix and the corresponding substring in the original text. This method is useful for debugging and understanding the structure of the suffix array.

Conclusion

This Java implementation of a suffix array provides an efficient way to search for substrings within a given string. The suffix array allows for fast searching using binary search, making it ideal for applications where substring queries are frequent. The implementation demonstrates the construction of a suffix array, as well as searching for patterns within the text.

 

By Aditya Bhuyan

I work as a cloud specialist. In addition to being an architect and SRE specialist, I work as a cloud engineer and developer. I have assisted my clients in converting their antiquated programmes into contemporary microservices that operate on various cloud computing platforms such as AWS, GCP, Azure, or VMware Tanzu, as well as orchestration systems such as Docker Swarm or Kubernetes. For over twenty years, I have been employed in the IT sector as a Java developer, J2EE architect, scrum master, and instructor. I write about Cloud Native and Cloud often. Bangalore, India is where my family and I call home. I maintain my physical and mental fitness by doing a lot of yoga and meditation.

Leave a Reply

Your email address will not be published. Required fields are marked *

error

Enjoy this blog? Please spread the word :)