Suffix Array Implementation in Python


Suffix Array Implementation in Python

A suffix array is a sorted array of all suffixes of a given string. It is a powerful tool for substring searching and other text processing tasks. Once the suffix array is built, it allows for efficient searching of any substring in the original string.

Program Structure

The suffix array is implemented using Python and includes the following components:

1. SuffixArray Class

This class encapsulates the functionality of the suffix array. It includes the following methods:

  • __init__(self, text): Initializes the suffix array for a given text by creating and sorting all suffixes.
  • build_suffix_array(self): Constructs the suffix array by sorting the suffixes of the text.
  • search(self, pattern): Searches for a pattern in the text using binary search on the suffix array.

Python Code Implementation


class SuffixArray:
def __init__(self, text):
"""
Initialize the SuffixArray object.
:param text: The input text to build the suffix array for.
"""
self.text = text
self.suffix_array = self.build_suffix_array()

def build_suffix_array(self):
"""
Build the suffix array by sorting all suffixes of the text.
:return: Sorted list of starting indices of suffixes.
"""
suffixes = [(self.text[i:], i) for i in range(len(self.text))]
suffixes.sort() # Sort based on the suffix strings
return [suffix[1] for suffix in suffixes]

def search(self, pattern):
"""
Search for a pattern in the text using the suffix array.
:param pattern: The pattern to search for.
:return: List of starting indices where the pattern is found.
"""
l, r = 0, len(self.suffix_array)
result = []
while l < r:
mid = (l + r) // 2
start_idx = self.suffix_array[mid]
suffix = self.text[start_idx:start_idx + len(pattern)]
if suffix < pattern:
l = mid + 1
elif suffix > pattern:
r = mid
else:
# Pattern found, search for all occurrences
# Search to the left
i = mid
while i >= 0 and self.text[self.suffix_array[i]:].startswith(pattern):
result.append(self.suffix_array[i])
i -= 1
# Search to the right
i = mid + 1
while i < len(self.suffix_array) and self.text[self.suffix_array[i]:].startswith(pattern):
result.append(self.suffix_array[i])
i += 1
break
return sorted(result) # Return sorted indices

Explanation

The SuffixArray class handles the construction of the suffix array and searching for patterns. Here’s how each component works:

1. Building the Suffix Array

The build_suffix_array method constructs the suffix array by generating all suffixes of the input text and then sorting them lexicographically. The suffix array is a list of starting indices of these sorted suffixes.

2. Searching for a Pattern

The search method uses binary search on the suffix array to find the pattern in the text. It compares the pattern with the suffixes and narrows down the search range. If the pattern is found, it also checks for multiple occurrences to the left and right of the found index.

Usage Example


# Example usage of the SuffixArray

text = "banana"
pattern = "ana"

suffix_array = SuffixArray(text)
result = suffix_array.search(pattern)

print(f"Pattern '{pattern}' found at indices: {result}")
# Output: Pattern 'ana' found at indices: [1, 3]

This example demonstrates how to create a suffix array for the string “banana”, and how to search for the pattern “ana”. The output shows the starting indices of all occurrences of the pattern in the text.


By Aditya Bhuyan

I work as a cloud specialist. In addition to being an architect and SRE specialist, I work as a cloud engineer and developer. I have assisted my clients in converting their antiquated programmes into contemporary microservices that operate on various cloud computing platforms such as AWS, GCP, Azure, or VMware Tanzu, as well as orchestration systems such as Docker Swarm or Kubernetes. For over twenty years, I have been employed in the IT sector as a Java developer, J2EE architect, scrum master, and instructor. I write about Cloud Native and Cloud often. Bangalore, India is where my family and I call home. I maintain my physical and mental fitness by doing a lot of yoga and meditation.

Leave a Reply

Your email address will not be published. Required fields are marked *

error

Enjoy this blog? Please spread the word :)