Bloom Filter Implementation in C

A Bloom filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set. It can give false positives (indicating that an element might be in the set when it is not), but it never gives false negatives (indicating that an element is not in the set when it is).

Program Structure

This implementation of a Bloom filter in C will include the following components:

  1. Struct definition to hold the Bloom filter bit array and the size of the array.
  2. Hash functions for generating indices to set/check in the bit array.
  3. Initialization function to create the Bloom filter.
  4. Functions to add elements to the Bloom filter and to check for membership.
  5. A main function to demonstrate the usage of the Bloom filter.

Code Implementation

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>

#define BLOOM_SIZE 1000  // Size of the Bloom filter bit array

// Define the structure for the Bloom filter
typedef struct {
    unsigned char *bit_array;  // Array of bits
    int size;                  // Size of the bit array
} BloomFilter;

// Hash function prototypes
unsigned int hash1(const char *str);
unsigned int hash2(const char *str);

// Function to initialize the Bloom filter
BloomFilter* createBloomFilter(int size) {
    BloomFilter *filter = (BloomFilter*)malloc(sizeof(BloomFilter));
    filter->bit_array = (unsigned char*)calloc(size, sizeof(unsigned char));
    filter->size = size;
    return filter;
}

// Function to add an element to the Bloom filter
void addToBloomFilter(BloomFilter *filter, const char *str) {
    unsigned int index1 = hash1(str) % filter->size;
    unsigned int index2 = hash2(str) % filter->size;
    filter->bit_array[index1] = 1;
    filter->bit_array[index2] = 1;
}

// Function to check if an element is possibly in the Bloom filter
bool checkBloomFilter(BloomFilter *filter, const char *str) {
    unsigned int index1 = hash1(str) % filter->size;
    unsigned int index2 = hash2(str) % filter->size;
    return filter->bit_array[index1] && filter->bit_array[index2];
}

// First hash function (djb2)
unsigned int hash1(const char *str) {
    unsigned int hash = 5381;
    int c;
    while ((c = *str++)) {
        hash = ((hash << 5) + hash) + c;  // hash * 33 + c
    }
    return hash;
}

// Second hash function (sdbm)
unsigned int hash2(const char *str) {
    unsigned int hash = 0;
    int c;
    while ((c = *str++)) {
        hash = c + (hash << 6) + (hash << 16) - hash; } return hash; } // Main function to demonstrate the usage of the Bloom filter int main() { BloomFilter *filter = createBloomFilter(BLOOM_SIZE); // Add some elements to the Bloom filter addToBloomFilter(filter, "hello"); addToBloomFilter(filter, "world"); addToBloomFilter(filter, "foo"); addToBloomFilter(filter, "bar"); // Check for membership printf("Checking 'hello': %s\n", checkBloomFilter(filter, "hello") ? "Possibly in set" : "Definitely not in set"); printf("Checking 'world': %s\n", checkBloomFilter(filter, "world") ? "Possibly in set" : "Definitely not in set"); printf("Checking 'baz': %s\n", checkBloomFilter(filter, "baz") ? "Possibly in set" : "Definitely not in set"); // Free allocated memory free(filter->bit_array);
    free(filter);

    return 0;
}

Explanation

The program begins by defining a struct BloomFilter that contains a bit array (bit_array) and the size of this array (size). The size is defined as BLOOM_SIZE, which is set to 1000 in this implementation. This bit array is used to store the results of the hash functions applied to the elements being added to the filter.

Two hash functions, hash1() and hash2(), are used to calculate the positions in the bit array where bits are set or checked. These hash functions generate different hash values for the same input, minimizing the likelihood of collisions and false positives.

The createBloomFilter() function initializes a Bloom filter by allocating memory for the bit array and setting all bits to 0. The addToBloomFilter() function takes a string, hashes it using both hash functions, and sets the corresponding bits in the bit array.

The checkBloomFilter() function checks whether a given string is possibly in the set by hashing it and checking if the corresponding bits are set. If both bits are set, the string is considered “possibly in the set” (though it could be a false positive); otherwise, it is “definitely not in the set.”

In the main() function, we demonstrate adding strings like “hello”, “world”, “foo”, and “bar” to the Bloom filter and then check if they or other strings like “baz” are in the set. The output will indicate whether each string is “possibly in the set” or “definitely not in the set.”

 

By Aditya Bhuyan

I work as a cloud specialist. In addition to being an architect and SRE specialist, I work as a cloud engineer and developer. I have assisted my clients in converting their antiquated programmes into contemporary microservices that operate on various cloud computing platforms such as AWS, GCP, Azure, or VMware Tanzu, as well as orchestration systems such as Docker Swarm or Kubernetes. For over twenty years, I have been employed in the IT sector as a Java developer, J2EE architect, scrum master, and instructor. I write about Cloud Native and Cloud often. Bangalore, India is where my family and I call home. I maintain my physical and mental fitness by doing a lot of yoga and meditation.

Leave a Reply

Your email address will not be published. Required fields are marked *

error

Enjoy this blog? Please spread the word :)