Suffix Array Implementation in Go

A suffix array is a data structure that provides a fast and memory-efficient way to perform substring searches in a string. It is an array of integers representing the starting positions of the suffixes of a string, sorted in lexicographical order.

Program Structure

The Go program implementing the suffix array is structured into the following components:

  1. Suffix Struct: This struct represents a suffix, containing the starting index of the suffix in the original string and the suffix itself.
  2. SuffixArray Struct: This struct contains the suffix array, which is a slice of integers representing the starting indices of the sorted suffixes.
  3. NewSuffixArray Function: This function initializes and builds the suffix array for a given string.
  4. BinarySearch Method: This method uses binary search on the suffix array to check if a given substring exists in the original string.

Go Code Implementation


package main

import (
    "fmt"
    "sort"
)

// Suffix represents a suffix of a string
type Suffix struct {
    index int
    value string
}

// SuffixArray represents a suffix array for a given string
type SuffixArray struct {
    suffixes []Suffix
}

// NewSuffixArray creates a new suffix array for the given text
func NewSuffixArray(text string) *SuffixArray {
    suffixes := make([]Suffix, len(text))
    for i := 0; i < len(text); i++ {
        suffixes[i] = Suffix{
            index: i,
            value: text[i:],
        }
    }
    // Sort the suffixes lexicographically
    sort.Slice(suffixes, func(i, j int) bool {
        return suffixes[i].value < suffixes[j].value
    })

    return &SuffixArray{suffixes: suffixes}
}

// BinarySearch performs a binary search on the suffix array to find the substring
func (sa *SuffixArray) BinarySearch(pattern string) bool {
    left, right := 0, len(sa.suffixes)-1
    for left <= right {
        mid := (left + right) / 2
        suffix := sa.suffixes[mid].value
        // Compare the pattern with the suffix
        if suffix[:min(len(suffix), len(pattern))] < pattern { left = mid + 1 } else if suffix[:min(len(suffix), len(pattern))] > pattern {
            right = mid - 1
        } else {
            return true // Pattern found
        }
    }
    return false // Pattern not found
}

// min returns the smaller of two integers
func min(a, b int) int {
    if a < b {
        return a
    }
    return b
}

func main() {
    text := "banana"
    suffixArray := NewSuffixArray(text)

    // Test cases
    fmt.Println(suffixArray.BinarySearch("ana")) // Output: true
    fmt.Println(suffixArray.BinarySearch("ban")) // Output: true
    fmt.Println(suffixArray.BinarySearch("nan")) // Output: true
    fmt.Println(suffixArray.BinarySearch("apple")) // Output: false
}
    

Explanation of the Code

Here’s a breakdown of the code:

  • Suffix Struct: The Suffix struct holds the starting index of the suffix in the original string and the suffix itself.
  • SuffixArray Struct: The SuffixArray struct contains a slice of Suffix structs, representing the suffixes of the original string sorted in lexicographical order.
  • NewSuffixArray Function: This function generates all suffixes of the given string, stores them in a slice, and sorts them lexicographically. The sorted suffixes are stored in the SuffixArray struct.
  • BinarySearch Method: This method uses binary search to check if a given pattern (substring) exists in the original string. The search is performed on the sorted suffixes in the suffix array.
  • min Function: This utility function returns the smaller of two integers, used to safely compare the substring with the suffix.

Usage Example

Here’s an example of how you might use this suffix array implementation in a Go program:


func main() {
    text := "banana"
    suffixArray := NewSuffixArray(text)

    // Test cases
    fmt.Println(suffixArray.BinarySearch("ana")) // Output: true
    fmt.Println(suffixArray.BinarySearch("ban")) // Output: true
    fmt.Println(suffixArray.BinarySearch("nan")) // Output: true
    fmt.Println(suffixArray.BinarySearch("apple")) // Output: false
}
    

Conclusion

This Go program demonstrates a basic implementation of a suffix array, a powerful data structure for efficient substring search. By constructing the suffix array and performing binary search, you can quickly determine if a substring exists within a string.

 

By Aditya Bhuyan

I work as a cloud specialist. In addition to being an architect and SRE specialist, I work as a cloud engineer and developer. I have assisted my clients in converting their antiquated programmes into contemporary microservices that operate on various cloud computing platforms such as AWS, GCP, Azure, or VMware Tanzu, as well as orchestration systems such as Docker Swarm or Kubernetes. For over twenty years, I have been employed in the IT sector as a Java developer, J2EE architect, scrum master, and instructor. I write about Cloud Native and Cloud often. Bangalore, India is where my family and I call home. I maintain my physical and mental fitness by doing a lot of yoga and meditation.

Leave a Reply

Your email address will not be published. Required fields are marked *

error

Enjoy this blog? Please spread the word :)