Suffix Array Implementation in Bash
This document explains how to implement a Suffix Array using Bash. A Suffix Array is a sorted array of all suffixes of a given string. It is an efficient data structure used for substring search, enabling quick lookups of whether a substring exists in the original string.
Program Structure
The program is structured into several key functions:
- build_suffix_array: Constructs the suffix array for a given string.
- binary_search: Performs a binary search to check if a substring exists in the original string using the suffix array.
- search: A helper function that integrates the suffix array with the binary search to perform substring search.
Implementation
#!/bin/bash
# Function to build the suffix array
build_suffix_array() {
local string=$1
local n=${#string}
suffix_array=()
# Generate all suffixes with their starting index
for (( i=0; i<n; i++ )); do
suffix_array[i]="${string:i} $i"
done
# Sort suffixes lexicographically
suffix_array=( $(for suffix in "${suffix_array[@]}"; do echo "$suffix"; done | sort) )
}
# Function to perform binary search for substring in the suffix array
binary_search() {
local substring=$1
local low=0
local high=$((${#suffix_array[@]} - 1))
while [ $low -le $high ]; do
local mid=$(( (low + high) / 2 ))
local suffix=${suffix_array[mid]% *} # Extract the suffix from the array element
if [[ "$suffix" == "$substring"* ]]; then
return 0 # Found the substring
elif [[ "$suffix" < "$substring" ]]; then
low=$((mid + 1))
else
high=$((mid - 1))
fi
done
return 1 # Substring not found
}
# Function to search for a substring using the suffix array
search() {
local substring=$1
if binary_search "$substring"; then
echo "\"$substring\" is present in the string."
else
echo "\"$substring\" is not present in the string."
fi
}
# Example usage
input_string="banana"
build_suffix_array "$input_string"
echo "Suffix array for \"$input_string\":"
for suffix in "${suffix_array[@]}"; do
echo "$suffix"
done
# Search for substrings
search "ana"
search "nana"
search "ban"
search "apple"
Explanation
The program implements a basic Suffix Array with the following components:
Building the Suffix Array
The build_suffix_array
function takes a string as input and constructs an array of all suffixes of that string. Each suffix is stored along with its starting index. The suffixes are then sorted lexicographically (alphabetically). This sorted array allows for efficient substring search using binary search.
Binary Search for Substrings
The binary_search
function performs a binary search on the suffix array to check whether a given substring is present in the original string. It compares the substring with each suffix in the array. If the substring matches the beginning of any suffix, the function returns success (found). If no match is found, it returns failure (not found).
Searching for a Substring
The search
function is a simple wrapper that calls the binary_search
function and prints whether the substring is present in the original string or not.
Example Usage
The example usage section demonstrates how to build the suffix array for the string “banana” and then perform searches for various substrings. The suffix array is printed for reference, and searches are conducted for substrings such as “ana”, “nana”, “ban”, and “apple”. The program outputs whether each substring is present in the original string.
Conclusion
This Bash implementation of a Suffix Array provides an efficient method for substring search. The combination of suffix array construction and binary search allows for quick lookups, making it a powerful tool for text processing tasks where substring search is required.