Suffix Array Implementation in Go
A suffix array is a data structure that provides a fast and memory-efficient way to perform substring searches in a string. It is an array of integers representing the starting positions of the suffixes of a string, sorted in lexicographical order.
Program Structure
The Go program implementing the suffix array is structured into the following components:
- Suffix Struct: This struct represents a suffix, containing the starting index of the suffix in the original string and the suffix itself.
- SuffixArray Struct: This struct contains the suffix array, which is a slice of integers representing the starting indices of the sorted suffixes.
- NewSuffixArray Function: This function initializes and builds the suffix array for a given string.
- BinarySearch Method: This method uses binary search on the suffix array to check if a given substring exists in the original string.
Go Code Implementation
package main
import (
"fmt"
"sort"
)
// Suffix represents a suffix of a string
type Suffix struct {
index int
value string
}
// SuffixArray represents a suffix array for a given string
type SuffixArray struct {
suffixes []Suffix
}
// NewSuffixArray creates a new suffix array for the given text
func NewSuffixArray(text string) *SuffixArray {
suffixes := make([]Suffix, len(text))
for i := 0; i < len(text); i++ {
suffixes[i] = Suffix{
index: i,
value: text[i:],
}
}
// Sort the suffixes lexicographically
sort.Slice(suffixes, func(i, j int) bool {
return suffixes[i].value < suffixes[j].value
})
return &SuffixArray{suffixes: suffixes}
}
// BinarySearch performs a binary search on the suffix array to find the substring
func (sa *SuffixArray) BinarySearch(pattern string) bool {
left, right := 0, len(sa.suffixes)-1
for left <= right {
mid := (left + right) / 2
suffix := sa.suffixes[mid].value
// Compare the pattern with the suffix
if suffix[:min(len(suffix), len(pattern))] < pattern { left = mid + 1 } else if suffix[:min(len(suffix), len(pattern))] > pattern {
right = mid - 1
} else {
return true // Pattern found
}
}
return false // Pattern not found
}
// min returns the smaller of two integers
func min(a, b int) int {
if a < b {
return a
}
return b
}
func main() {
text := "banana"
suffixArray := NewSuffixArray(text)
// Test cases
fmt.Println(suffixArray.BinarySearch("ana")) // Output: true
fmt.Println(suffixArray.BinarySearch("ban")) // Output: true
fmt.Println(suffixArray.BinarySearch("nan")) // Output: true
fmt.Println(suffixArray.BinarySearch("apple")) // Output: false
}
Explanation of the Code
Here’s a breakdown of the code:
- Suffix Struct: The
Suffix
struct holds the starting index of the suffix in the original string and the suffix itself. - SuffixArray Struct: The
SuffixArray
struct contains a slice ofSuffix
structs, representing the suffixes of the original string sorted in lexicographical order. - NewSuffixArray Function: This function generates all suffixes of the given string, stores them in a slice, and sorts them lexicographically. The sorted suffixes are stored in the
SuffixArray
struct. - BinarySearch Method: This method uses binary search to check if a given pattern (substring) exists in the original string. The search is performed on the sorted suffixes in the suffix array.
- min Function: This utility function returns the smaller of two integers, used to safely compare the substring with the suffix.
Usage Example
Here’s an example of how you might use this suffix array implementation in a Go program:
func main() {
text := "banana"
suffixArray := NewSuffixArray(text)
// Test cases
fmt.Println(suffixArray.BinarySearch("ana")) // Output: true
fmt.Println(suffixArray.BinarySearch("ban")) // Output: true
fmt.Println(suffixArray.BinarySearch("nan")) // Output: true
fmt.Println(suffixArray.BinarySearch("apple")) // Output: false
}
Conclusion
This Go program demonstrates a basic implementation of a suffix array, a powerful data structure for efficient substring search. By constructing the suffix array and performing binary search, you can quickly determine if a substring exists within a string.