Bloom Filter Implementation in Go
A Bloom filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set. False positives are possible, but false negatives are not. This means that a Bloom filter can tell you an item is either “possibly in the set” or “definitely not in the set.”
Program Structure
The Go program implementing the Bloom filter is structured into the following components:
- BloomFilter Struct: This struct contains a bit array and a slice of hash functions. The bit array is used to represent the set, and the hash functions are used to map elements to positions in the bit array.
- NewBloomFilter Function: This function initializes a new Bloom filter with a specified size and number of hash functions.
- Add Method: This method adds an element to the Bloom filter. It hashes the element using each hash function and sets the corresponding bits in the bit array.
- Check Method: This method checks if an element is in the set. It hashes the element using each hash function and checks the corresponding bits in the bit array. If all relevant bits are set, the element is possibly in the set.
- hashFunction: A helper function to generate hash functions using different seeds.
Go Code Implementation
package main
import (
"crypto/sha256"
"encoding/binary"
"fmt"
"hash"
)
// BloomFilter represents a Bloom filter data structure
type BloomFilter struct {
bitArray []bool
hashFuncs []hash.Hash
bitArraySize int
}
// NewBloomFilter creates a new BloomFilter with the given size and number of hash functions
func NewBloomFilter(size int, numHashFuncs int) *BloomFilter {
bf := &BloomFilter{
bitArray: make([]bool, size),
bitArraySize: size,
}
for i := 0; i < numHashFuncs; i++ {
bf.hashFuncs = append(bf.hashFuncs, hashFunction(i))
}
return bf
}
// Add adds an element to the Bloom filter
func (bf *BloomFilter) Add(item string) {
for _, hashFunc := range bf.hashFuncs {
index := bf.hash(hashFunc, item)
bf.bitArray[index] = true
}
}
// Check checks if an element is possibly in the set
func (bf *BloomFilter) Check(item string) bool {
for _, hashFunc := range bf.hashFuncs {
index := bf.hash(hashFunc, item)
if !bf.bitArray[index] {
return false // Definitely not in the set
}
}
return true // Possibly in the set
}
// hash applies a hash function to the item and returns an index for the bit array
func (bf *BloomFilter) hash(hashFunc hash.Hash, item string) int {
hashFunc.Reset()
hashFunc.Write([]byte(item))
sum := hashFunc.Sum(nil)
index := binary.BigEndian.Uint32(sum) % uint32(bf.bitArraySize)
return int(index)
}
// hashFunction returns a new hash function with a specific seed
func hashFunction(seed int) hash.Hash {
h := sha256.New()
h.Write([]byte(fmt.Sprintf("%d", seed)))
return h
}
func main() {
bf := NewBloomFilter(1000, 5) // Create a Bloom filter with 1000 bits and 5 hash functions
// Add some items to the filter
bf.Add("apple")
bf.Add("banana")
bf.Add("cherry")
// Check for items
fmt.Println(bf.Check("apple")) // Output: true
fmt.Println(bf.Check("banana")) // Output: true
fmt.Println(bf.Check("grape")) // Output: false (definitely not in the set)
fmt.Println(bf.Check("cherry")) // Output: true
}
Explanation of the Code
Here’s a breakdown of the code:
- BloomFilter Struct: The
bitArray
is a slice of booleans representing the set. ThehashFuncs
slice contains the hash functions used to map elements to positions in the bit array. - NewBloomFilter Function: This function initializes a Bloom filter with a specified bit array size and number of hash functions. The hash functions are created using the
hashFunction
helper function with different seeds to ensure variety. - Add Method: This method adds an element to the Bloom filter by hashing the element with each hash function and setting the corresponding bit in the bit array.
- Check Method: This method checks whether an element is possibly in the set. It hashes the element with each hash function and checks the corresponding bits in the bit array. If any bit is not set, the element is definitely not in the set.
- hashFunction Helper Function: This function generates a new hash function seeded with a unique integer. It uses the SHA-256 hashing algorithm to produce a strong hash value.
Usage Example
Here’s an example of how you might use this Bloom filter implementation in a Go program:
func main() {
bf := NewBloomFilter(1000, 5) // Create a Bloom filter with 1000 bits and 5 hash functions
// Add some items to the filter
bf.Add("apple")
bf.Add("banana")
bf.Add("cherry")
// Check for items
fmt.Println(bf.Check("apple")) // Output: true
fmt.Println(bf.Check("banana")) // Output: true
fmt.Println(bf.Check("grape")) // Output: false (definitely not in the set)
fmt.Println(bf.Check("cherry")) // Output: true
}
Conclusion
This Go program demonstrates a basic implementation of a Bloom filter, a powerful and space-efficient data structure used for probabilistic set membership checks. While Bloom filters allow for false positives, they are highly useful in scenarios where space efficiency and speed are crucial.