Bloom Filter Implementation in Bash

This document explains how to implement a Bloom Filter using Bash. A Bloom Filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set. The filter may yield false positives (indicating an element is in the set when it isn’t) but never false negatives (indicating an element isn’t in the set when it is).

Program Structure

The program is structured into several key functions:

  • initialize_filter: Initializes the Bloom Filter with a given size.
  • hash1, hash2, hash3: Hash functions used to set and check bits in the filter.
  • add: Adds an element to the Bloom Filter.
  • check: Checks whether an element is possibly in the Bloom Filter.

Implementation


#!/bin/bash

# Initialize the Bloom filter with a given size
initialize_filter() {
    local size=$1
    filter=()
    for (( i=0; i<size; i++ )); do
        filter[i]=0
    done
    filter_size=$size
}

# First hash function
hash1() {
    echo $(( $(echo -n "$1" | od -An -t d4 | awk '{s+=$1} END {print s}') % filter_size ))
}

# Second hash function
hash2() {
    echo $(( $(echo -n "$1" | od -An -t d4 | awk '{s*=$1} END {print s}') % filter_size ))
}

# Third hash function
hash3() {
    echo $(( $(echo -n "$1" | od -An -t d4 | awk '{s=($1+3)*7} END {print s}') % filter_size ))
}

# Add an element to the Bloom filter
add() {
    local element=$1
    local index1=$(hash1 "$element")
    local index2=$(hash2 "$element")
    local index3=$(hash3 "$element")

    filter[index1]=1
    filter[index2]=1
    filter[index3]=1
}

# Check if an element is in the Bloom filter
check() {
    local element=$1
    local index1=$(hash1 "$element")
    local index2=$(hash2 "$element")
    local index3=$(hash3 "$element")

    if [[ ${filter[index1]} -eq 1 && ${filter[index2]} -eq 1 && ${filter[index3]} -eq 1 ]]; then
        echo "$element might be in the set."
    else
        echo "$element is definitely not in the set."
    fi
}

# Example usage
initialize_filter 20

add "apple"
add "banana"
add "grape"

check "apple"    # Likely to say it's in the set
check "banana"   # Likely to say it's in the set
check "grape"    # Likely to say it's in the set
check "orange"   # Likely to say it's not in the set
    

Explanation

The program implements a simple Bloom Filter with the following components:

Initialization

The initialize_filter function initializes the Bloom Filter with a specified size. The filter is represented by an array where each position starts with a value of 0, indicating that no elements have been hashed to that position yet.

Hash Functions

The program includes three basic hash functions: hash1, hash2, and hash3. These functions are simplistic and use basic arithmetic on the ASCII values of the input string. Each hash function ensures that the result is within the bounds of the Bloom Filter array size.

Adding an Element

The add function hashes the input element using the three hash functions. It then sets the corresponding indices in the Bloom Filter array to 1, indicating that the element may be part of the set.

Checking for an Element

The check function hashes the input element with the same three hash functions. It checks whether all three corresponding indices in the Bloom Filter array are set to 1. If they are, the element is possibly in the set (though it could be a false positive). If any of the indices are 0, the element is definitely not in the set.

Example Usage

The example usage section shows how to initialize the filter, add elements to it, and check for their existence. For instance, “apple”, “banana”, and “grape” are added to the filter, and then the program checks for these elements as well as an element not in the filter (“orange”).

Conclusion

This Bash implementation of a Bloom Filter is a simple and efficient way to perform set membership tests with probabilistic guarantees. It’s particularly useful in applications where memory usage is a concern and false positives are acceptable.

 

By Aditya Bhuyan

I work as a cloud specialist. In addition to being an architect and SRE specialist, I work as a cloud engineer and developer. I have assisted my clients in converting their antiquated programmes into contemporary microservices that operate on various cloud computing platforms such as AWS, GCP, Azure, or VMware Tanzu, as well as orchestration systems such as Docker Swarm or Kubernetes. For over twenty years, I have been employed in the IT sector as a Java developer, J2EE architect, scrum master, and instructor. I write about Cloud Native and Cloud often. Bangalore, India is where my family and I call home. I maintain my physical and mental fitness by doing a lot of yoga and meditation.

Leave a Reply

Your email address will not be published. Required fields are marked *

error

Enjoy this blog? Please spread the word :)