Bloom Filter Implementation in C++

A Bloom Filter is a probabilistic data structure used to test whether an element is a member of a set. It can return two types of results:

  • True positive: The element is definitely in the set.
  • False positive: The element is possibly in the set (but not definitely).

The Bloom Filter does not store the elements themselves but uses multiple hash functions to set bits in a bit array. When querying an element, the Bloom Filter checks the corresponding bits. If all the bits are set, the element is likely in the set; otherwise, it is definitely not.

Program Structure

The C++ implementation of the Bloom Filter will include:

  1. Initialization: Setting up the bit array and hash functions.
  2. Hash Functions: Implementing multiple hash functions to map elements to different positions in the bit array.
  3. Insert Operation: Adding elements to the Bloom Filter.
  4. Query Operation: Checking for the possible membership of elements in the set.

Code Implementation


#include <iostream>
#include <vector>
#include <string>
#include <functional>

class BloomFilter {
private:
    std::vector<bool> bitArray;
    int size;
    int numHashFunctions;

    // Hash function for demonstration purposes
    int hash1(const std::string &key) const {
        std::hash<std::string> hash_fn;
        return hash_fn(key) % size;
    }

    // Another hash function for demonstration purposes
    int hash2(const std::string &key) const {
        std::hash<std::string> hash_fn;
        return (hash_fn(key) / size) % size;
    }

public:
    // Constructor to initialize the Bloom Filter
    BloomFilter(int size, int numHashFunctions)
        : size(size), numHashFunctions(numHashFunctions) {
        bitArray.resize(size, false);
    }

    // Insert an element into the Bloom Filter
    void insert(const std::string &key) {
        int index1 = hash1(key);
        int index2 = hash2(key);

        bitArray[index1] = true;
        bitArray[index2] = true;

        // For more hash functions, continue similarly
    }

    // Query an element in the Bloom Filter
    bool contains(const std::string &key) const {
        int index1 = hash1(key);
        int index2 = hash2(key);

        return bitArray[index1] && bitArray[index2];
    }
};

// Main function to demonstrate the Bloom Filter operations
int main() {
    int size = 100; // Size of the bit array
    int numHashFunctions = 2; // Number of hash functions

    BloomFilter bf(size, numHashFunctions);

    // Insert elements into the Bloom Filter
    bf.insert("hello");
    bf.insert("world");

    // Query for elements
    if (bf.contains("hello")) {
        std::cout << "'hello' is possibly in the set." << std::endl;
    } else {
        std::cout << "'hello' is definitely not in the set." << std::endl;
    }

    if (bf.contains("goodbye")) {
        std::cout << "'goodbye' is possibly in the set." << std::endl;
    } else {
        std::cout << "'goodbye' is definitely not in the set." << std::endl;
    }

    return 0;
}

Explanation

Initialization

The constructor of the BloomFilter class initializes a bit array of a given size and sets the number of hash functions. The bit array is initialized to false for all positions.

Hash Functions

In this implementation, two simple hash functions, hash1() and hash2(), are defined using the C++ standard library’s std::hash function. These functions compute different indices in the bit array where the bits will be set. More hash functions can be added for increased accuracy.

Insert Operation

The insert() function adds a key to the Bloom Filter. It calculates the indices using the hash functions and sets the corresponding bits in the bit array to true.

Query Operation

The contains() function checks if a key is present in the Bloom Filter. It calculates the indices using the hash functions and checks if all the corresponding bits in the bit array are set to true. If they are, the key is probably in the set; otherwise, it is definitely not.

Conclusion

The Bloom Filter is an efficient and space-saving probabilistic data structure for checking set membership. Although it can produce false positives, it never generates false negatives, making it useful for applications where the occasional false positive is acceptable. This C++ implementation provides a basic yet effective way to utilize Bloom Filters in various contexts.

 

By Aditya Bhuyan

I work as a cloud specialist. In addition to being an architect and SRE specialist, I work as a cloud engineer and developer. I have assisted my clients in converting their antiquated programmes into contemporary microservices that operate on various cloud computing platforms such as AWS, GCP, Azure, or VMware Tanzu, as well as orchestration systems such as Docker Swarm or Kubernetes. For over twenty years, I have been employed in the IT sector as a Java developer, J2EE architect, scrum master, and instructor. I write about Cloud Native and Cloud often. Bangalore, India is where my family and I call home. I maintain my physical and mental fitness by doing a lot of yoga and meditation.

Leave a Reply

Your email address will not be published. Required fields are marked *

error

Enjoy this blog? Please spread the word :)