Bloom Filter Implementation in C++
A Bloom Filter is a probabilistic data structure used to test whether an element is a member of a set. It can return two types of results:
- True positive: The element is definitely in the set.
- False positive: The element is possibly in the set (but not definitely).
The Bloom Filter does not store the elements themselves but uses multiple hash functions to set bits in a bit array. When querying an element, the Bloom Filter checks the corresponding bits. If all the bits are set, the element is likely in the set; otherwise, it is definitely not.
Program Structure
The C++ implementation of the Bloom Filter will include:
- Initialization: Setting up the bit array and hash functions.
- Hash Functions: Implementing multiple hash functions to map elements to different positions in the bit array.
- Insert Operation: Adding elements to the Bloom Filter.
- Query Operation: Checking for the possible membership of elements in the set.
Code Implementation
#include <iostream>
#include <vector>
#include <string>
#include <functional>
class BloomFilter {
private:
std::vector<bool> bitArray;
int size;
int numHashFunctions;
// Hash function for demonstration purposes
int hash1(const std::string &key) const {
std::hash<std::string> hash_fn;
return hash_fn(key) % size;
}
// Another hash function for demonstration purposes
int hash2(const std::string &key) const {
std::hash<std::string> hash_fn;
return (hash_fn(key) / size) % size;
}
public:
// Constructor to initialize the Bloom Filter
BloomFilter(int size, int numHashFunctions)
: size(size), numHashFunctions(numHashFunctions) {
bitArray.resize(size, false);
}
// Insert an element into the Bloom Filter
void insert(const std::string &key) {
int index1 = hash1(key);
int index2 = hash2(key);
bitArray[index1] = true;
bitArray[index2] = true;
// For more hash functions, continue similarly
}
// Query an element in the Bloom Filter
bool contains(const std::string &key) const {
int index1 = hash1(key);
int index2 = hash2(key);
return bitArray[index1] && bitArray[index2];
}
};
// Main function to demonstrate the Bloom Filter operations
int main() {
int size = 100; // Size of the bit array
int numHashFunctions = 2; // Number of hash functions
BloomFilter bf(size, numHashFunctions);
// Insert elements into the Bloom Filter
bf.insert("hello");
bf.insert("world");
// Query for elements
if (bf.contains("hello")) {
std::cout << "'hello' is possibly in the set." << std::endl;
} else {
std::cout << "'hello' is definitely not in the set." << std::endl;
}
if (bf.contains("goodbye")) {
std::cout << "'goodbye' is possibly in the set." << std::endl;
} else {
std::cout << "'goodbye' is definitely not in the set." << std::endl;
}
return 0;
}
Explanation
Initialization
The constructor of the BloomFilter
class initializes a bit array of a given size and sets the number of hash functions. The bit array is initialized to false
for all positions.
Hash Functions
In this implementation, two simple hash functions, hash1()
and hash2()
, are defined using the C++ standard library’s std::hash
function. These functions compute different indices in the bit array where the bits will be set. More hash functions can be added for increased accuracy.
Insert Operation
The insert()
function adds a key to the Bloom Filter. It calculates the indices using the hash functions and sets the corresponding bits in the bit array to true
.
Query Operation
The contains()
function checks if a key is present in the Bloom Filter. It calculates the indices using the hash functions and checks if all the corresponding bits in the bit array are set to true
. If they are, the key is probably in the set; otherwise, it is definitely not.
Conclusion
The Bloom Filter is an efficient and space-saving probabilistic data structure for checking set membership. Although it can produce false positives, it never generates false negatives, making it useful for applications where the occasional false positive is acceptable. This C++ implementation provides a basic yet effective way to utilize Bloom Filters in various contexts.