Bloom Filter Implementation in Python


Bloom Filter Implementation in Python

A Bloom filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set. It can return false positives, but it will never return false negatives. This makes Bloom filters useful in scenarios where space is limited and a small probability of false positives is acceptable.

Program Structure

The Bloom filter is implemented using Python and consists of the following key components:

1. BloomFilter Class

This class encapsulates the functionality of the Bloom filter. It includes the following methods:

  • __init__(self, size, hash_count): Initializes the Bloom filter with a bit array of given size and specifies the number of hash functions.
  • add(self, item): Adds an item to the Bloom filter by hashing it with multiple hash functions and setting the corresponding bits in the bit array.
  • check(self, item): Checks whether an item might be in the set by verifying if all the corresponding bits are set.
  • _hashes(self, item): Generates multiple hash values for the item.

Python Code Implementation


import hashlib

class BloomFilter:
def __init__(self, size, hash_count):
"""
Initialize the Bloom filter.
:param size: Size of the bit array.
:param hash_count: Number of hash functions to use.
"""
self.size = size
self.hash_count = hash_count
self.bit_array = [0] * size # Initialize bit array with 0s.

def _hashes(self, item):
"""
Generate hash_count number of hash values for the given item.
:param item: Item to hash.
:return: List of hash values.
"""
hashes = []
for i in range(self.hash_count):
# Create a hash for each count with a unique salt using SHA-256
hash_value = int(hashlib.sha256(f"{item}{i}".encode()).hexdigest(), 16)
hashes.append(hash_value % self.size)
return hashes

def add(self, item):
"""
Add an item to the Bloom filter.
:param item: Item to add.
"""
for hash_value in self._hashes(item):
self.bit_array[hash_value] = 1 # Set the bit to 1

def check(self, item):
"""
Check if an item is possibly in the set.
:param item: Item to check.
:return: True if the item might be in the set, False if it is definitely not.
"""
for hash_value in self._hashes(item):
if self.bit_array[hash_value] == 0:
return False # If any bit is 0, the item is definitely not in the set
return True # All bits are set, the item might be in the set

Explanation

The BloomFilter class uses a bit array to represent membership in a set. The key operations are adding items to the filter and checking for membership. The process is as follows:

1. Hashing Items

The _hashes method generates multiple hash values for a given item. This is done by concatenating the item with an integer value (used as a salt) and then hashing the result using the SHA-256 algorithm. The hash values are then reduced modulo the size of the bit array to get valid indices.

2. Adding Items

The add method takes an item and hashes it to generate multiple indices in the bit array. The corresponding bits at these indices are set to 1, indicating that the item has been added to the filter.

3. Checking Membership

The check method hashes the item in the same way as the add method. It then checks if all the corresponding bits in the bit array are set to 1. If any bit is 0, the item is definitely not in the set. If all bits are 1, the item might be in the set, but there’s a chance of a false positive.

Usage Example


# Example usage of the BloomFilter

# Create a Bloom filter with a size of 1000 bits and 10 hash functions
bloom = BloomFilter(1000, 10)

# Add some items to the filter
bloom.add("apple")
bloom.add("banana")
bloom.add("orange")

# Check for membership
print(bloom.check("apple")) # Output: True (apple might be in the set)
print(bloom.check("grape")) # Output: False (grape is definitely not in the set)
print(bloom.check("banana")) # Output: True (banana might be in the set)

This example demonstrates how to create a Bloom filter, add items to it, and check for membership. The filter is probabilistic, so it may return false positives, but never false negatives.


By Aditya Bhuyan

I work as a cloud specialist. In addition to being an architect and SRE specialist, I work as a cloud engineer and developer. I have assisted my clients in converting their antiquated programmes into contemporary microservices that operate on various cloud computing platforms such as AWS, GCP, Azure, or VMware Tanzu, as well as orchestration systems such as Docker Swarm or Kubernetes. For over twenty years, I have been employed in the IT sector as a Java developer, J2EE architect, scrum master, and instructor. I write about Cloud Native and Cloud often. Bangalore, India is where my family and I call home. I maintain my physical and mental fitness by doing a lot of yoga and meditation.

Leave a Reply

Your email address will not be published. Required fields are marked *

error

Enjoy this blog? Please spread the word :)