Bloom Filter Implementation in Go

A Bloom filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set. False positives are possible, but false negatives are not. This means that a Bloom filter can tell you an item is either “possibly in the set” or “definitely not in the set.”

Program Structure

The Go program implementing the Bloom filter is structured into the following components:

  1. BloomFilter Struct: This struct contains a bit array and a slice of hash functions. The bit array is used to represent the set, and the hash functions are used to map elements to positions in the bit array.
  2. NewBloomFilter Function: This function initializes a new Bloom filter with a specified size and number of hash functions.
  3. Add Method: This method adds an element to the Bloom filter. It hashes the element using each hash function and sets the corresponding bits in the bit array.
  4. Check Method: This method checks if an element is in the set. It hashes the element using each hash function and checks the corresponding bits in the bit array. If all relevant bits are set, the element is possibly in the set.
  5. hashFunction: A helper function to generate hash functions using different seeds.

Go Code Implementation


package main

import (
    "crypto/sha256"
    "encoding/binary"
    "fmt"
    "hash"
)

// BloomFilter represents a Bloom filter data structure
type BloomFilter struct {
    bitArray    []bool
    hashFuncs   []hash.Hash
    bitArraySize int
}

// NewBloomFilter creates a new BloomFilter with the given size and number of hash functions
func NewBloomFilter(size int, numHashFuncs int) *BloomFilter {
    bf := &BloomFilter{
        bitArray:    make([]bool, size),
        bitArraySize: size,
    }
    for i := 0; i < numHashFuncs; i++ {
        bf.hashFuncs = append(bf.hashFuncs, hashFunction(i))
    }
    return bf
}

// Add adds an element to the Bloom filter
func (bf *BloomFilter) Add(item string) {
    for _, hashFunc := range bf.hashFuncs {
        index := bf.hash(hashFunc, item)
        bf.bitArray[index] = true
    }
}

// Check checks if an element is possibly in the set
func (bf *BloomFilter) Check(item string) bool {
    for _, hashFunc := range bf.hashFuncs {
        index := bf.hash(hashFunc, item)
        if !bf.bitArray[index] {
            return false // Definitely not in the set
        }
    }
    return true // Possibly in the set
}

// hash applies a hash function to the item and returns an index for the bit array
func (bf *BloomFilter) hash(hashFunc hash.Hash, item string) int {
    hashFunc.Reset()
    hashFunc.Write([]byte(item))
    sum := hashFunc.Sum(nil)
    index := binary.BigEndian.Uint32(sum) % uint32(bf.bitArraySize)
    return int(index)
}

// hashFunction returns a new hash function with a specific seed
func hashFunction(seed int) hash.Hash {
    h := sha256.New()
    h.Write([]byte(fmt.Sprintf("%d", seed)))
    return h
}

func main() {
    bf := NewBloomFilter(1000, 5) // Create a Bloom filter with 1000 bits and 5 hash functions

    // Add some items to the filter
    bf.Add("apple")
    bf.Add("banana")
    bf.Add("cherry")

    // Check for items
    fmt.Println(bf.Check("apple"))  // Output: true
    fmt.Println(bf.Check("banana")) // Output: true
    fmt.Println(bf.Check("grape"))  // Output: false (definitely not in the set)
    fmt.Println(bf.Check("cherry")) // Output: true
}
    

Explanation of the Code

Here’s a breakdown of the code:

  • BloomFilter Struct: The bitArray is a slice of booleans representing the set. The hashFuncs slice contains the hash functions used to map elements to positions in the bit array.
  • NewBloomFilter Function: This function initializes a Bloom filter with a specified bit array size and number of hash functions. The hash functions are created using the hashFunction helper function with different seeds to ensure variety.
  • Add Method: This method adds an element to the Bloom filter by hashing the element with each hash function and setting the corresponding bit in the bit array.
  • Check Method: This method checks whether an element is possibly in the set. It hashes the element with each hash function and checks the corresponding bits in the bit array. If any bit is not set, the element is definitely not in the set.
  • hashFunction Helper Function: This function generates a new hash function seeded with a unique integer. It uses the SHA-256 hashing algorithm to produce a strong hash value.

Usage Example

Here’s an example of how you might use this Bloom filter implementation in a Go program:


func main() {
    bf := NewBloomFilter(1000, 5) // Create a Bloom filter with 1000 bits and 5 hash functions

    // Add some items to the filter
    bf.Add("apple")
    bf.Add("banana")
    bf.Add("cherry")

    // Check for items
    fmt.Println(bf.Check("apple"))  // Output: true
    fmt.Println(bf.Check("banana")) // Output: true
    fmt.Println(bf.Check("grape"))  // Output: false (definitely not in the set)
    fmt.Println(bf.Check("cherry")) // Output: true
}
    

Conclusion

This Go program demonstrates a basic implementation of a Bloom filter, a powerful and space-efficient data structure used for probabilistic set membership checks. While Bloom filters allow for false positives, they are highly useful in scenarios where space efficiency and speed are crucial.

 

By Aditya Bhuyan

I work as a cloud specialist. In addition to being an architect and SRE specialist, I work as a cloud engineer and developer. I have assisted my clients in converting their antiquated programmes into contemporary microservices that operate on various cloud computing platforms such as AWS, GCP, Azure, or VMware Tanzu, as well as orchestration systems such as Docker Swarm or Kubernetes. For over twenty years, I have been employed in the IT sector as a Java developer, J2EE architect, scrum master, and instructor. I write about Cloud Native and Cloud often. Bangalore, India is where my family and I call home. I maintain my physical and mental fitness by doing a lot of yoga and meditation.

Leave a Reply

Your email address will not be published. Required fields are marked *

error

Enjoy this blog? Please spread the word :)