Suffix Array Implementation in C
A suffix array is a powerful data structure that is used for efficient substring searching in a given string. It is an array of all the suffixes of a string sorted in lexicographical order. Once built, a suffix array allows for quick searches of substrings within the original string.
Program Structure
This implementation of a suffix array in C will include the following components:
- Struct definition for holding suffixes and their indices.
- A comparison function for sorting suffixes.
- Functions for constructing the suffix array.
- A binary search function to search for a substring using the suffix array.
- A main function to demonstrate the usage of the suffix array for substring search.
Code Implementation
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// Structure to store a suffix and its index
typedef struct {
int index; // Original index of the suffix in the string
char *suffix; // The suffix itself
} Suffix;
// Comparison function to sort suffixes lexicographically
int compareSuffixes(const void *a, const void *b) {
Suffix *suffixA = (Suffix*)a;
Suffix *suffixB = (Suffix*)b;
return strcmp(suffixA->suffix, suffixB->suffix);
}
// Function to build the suffix array
int* buildSuffixArray(char *text, int n) {
Suffix *suffixes = (Suffix*)malloc(n * sizeof(Suffix));
int *suffixArray = (int*)malloc(n * sizeof(int));
// Create suffixes and their indices
for (int i = 0; i < n; i++) {
suffixes[i].index = i;
suffixes[i].suffix = text + i;
}
// Sort the suffixes using the comparison function
qsort(suffixes, n, sizeof(Suffix), compareSuffixes);
// Store the sorted suffix indices in the suffix array
for (int i = 0; i < n; i++) {
suffixArray[i] = suffixes[i].index;
}
// Free the suffixes array
free(suffixes);
return suffixArray;
}
// Function to perform binary search for a pattern in the suffix array
int searchSuffixArray(char *text, int *suffixArray, int n, char *pattern) {
int m = strlen(pattern);
int low = 0, high = n - 1;
while (low <= high) {
int mid = (low + high) / 2;
int res = strncmp(text + suffixArray[mid], pattern, m);
if (res == 0) {
return suffixArray[mid]; // Pattern found
} else if (res < 0) {
low = mid + 1;
} else {
high = mid - 1;
}
}
return -1; // Pattern not found
}
// Main function to demonstrate the usage of the suffix array
int main() {
char text[] = "banana";
int n = strlen(text);
// Build the suffix array
int *suffixArray = buildSuffixArray(text, n);
// Print the suffix array
printf("Suffix Array:\n");
for (int i = 0; i < n; i++) {
printf("%d: %s\n", suffixArray[i], text + suffixArray[i]);
}
// Search for a pattern in the text
char pattern[] = "nan";
int result = searchSuffixArray(text, suffixArray, n, pattern);
if (result != -1) {
printf("Pattern '%s' found at index %d\n", pattern, result);
} else {
printf("Pattern '%s' not found\n", pattern);
}
// Free allocated memory
free(suffixArray);
return 0;
}
Explanation
The program begins by defining a struct Suffix that contains an integer index to store the original position of the suffix in the text, and a pointer suffix that points to the start of the suffix in the string.
The compareSuffixes() function is a comparison function used by the qsort() function to sort the suffixes lexicographically. This is necessary because a suffix array is an array of suffixes sorted in lexicographical order.
The buildSuffixArray() function generates the suffix array for a given text. It creates all possible suffixes of the text, stores them along with their indices, sorts them using qsort(), and finally stores the indices of these sorted suffixes in the suffix array.
Once the suffix array is built, the searchSuffixArray() function allows for efficient searching of a substring within the text. It performs a binary search on the suffix array to locate the pattern. If the pattern is found, the function returns the starting index of the pattern in the text; otherwise, it returns -1.
In the main() function, we demonstrate the creation of a suffix array for the string “banana” and search for the substring “nan”. The suffix array is printed out, and the result of the substring search is displayed.

