Bloom Filter Implementation in Bash

This document explains how to implement a Bloom Filter using Bash. A Bloom Filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set. The filter may yield false positives (indicating an element is in the set when it isn’t) but never false negatives (indicating an element isn’t in the set when it is).

Program Structure

The program is structured into several key functions:

initialize_filter: Initializes the Bloom Filter with a given size.
hash1, hash2, hash3: Hash functions used to set and check bits in the filter.
add: Adds an element to the Bloom Filter.
check: Checks whether an element is possibly in the Bloom Filter.

Implementation


#!/bin/bash

# Initialize the Bloom filter with a given size
initialize_filter() {
    local size=$1
    filter=()
    for (( i=0; i<size; i++ )); do
        filter[i]=0
    done
    filter_size=$size
}

# First hash function
hash1() {
    echo $(( $(echo -n "$1" | od -An -t d4 | awk '{s+=$1} END {print s}') % filter_size ))
}

# Second hash function
hash2() {
    echo $(( $(echo -n "$1" | od -An -t d4 | awk '{s*=$1} END {print s}') % filter_size ))
}

# Third hash function
hash3() {
    echo $(( $(echo -n "$1" | od -An -t d4 | awk '{s=($1+3)*7} END {print s}') % filter_size ))
}

# Add an element to the Bloom filter
add() {
    local element=$1
    local index1=$(hash1 "$element")
    local index2=$(hash2 "$element")
    local index3=$(hash3 "$element")

    filter[index1]=1
    filter[index2]=1
    filter[index3]=1
}

# Check if an element is in the Bloom filter
check() {
    local element=$1
    local index1=$(hash1 "$element")
    local index2=$(hash2 "$element")
    local index3=$(hash3 "$element")

    if [[ ${filter[index1]} -eq 1 && ${filter[index2]} -eq 1 && ${filter[index3]} -eq 1 ]]; then
        echo "$element might be in the set."
    else
        echo "$element is definitely not in the set."
    fi
}

# Example usage
initialize_filter 20

add "apple"
add "banana"
add "grape"

check "apple"    # Likely to say it's in the set
check "banana"   # Likely to say it's in the set
check "grape"    # Likely to say it's in the set
check "orange"   # Likely to say it's not in the set

Explanation

The program implements a simple Bloom Filter with the following components:

Initialization

The initialize_filter function initializes the Bloom Filter with a specified size. The filter is represented by an array where each position starts with a value of 0, indicating that no elements have been hashed to that position yet.

Hash Functions

The program includes three basic hash functions: hash1, hash2, and hash3. These functions are simplistic and use basic arithmetic on the ASCII values of the input string. Each hash function ensures that the result is within the bounds of the Bloom Filter array size.

Adding an Element

The add function hashes the input element using the three hash functions. It then sets the corresponding indices in the Bloom Filter array to 1, indicating that the element may be part of the set.

Checking for an Element

The check function hashes the input element with the same three hash functions. It checks whether all three corresponding indices in the Bloom Filter array are set to 1. If they are, the element is possibly in the set (though it could be a false positive). If any of the indices are 0, the element is definitely not in the set.

Example Usage

The example usage section shows how to initialize the filter, add elements to it, and check for their existence. For instance, “apple”, “banana”, and “grape” are added to the filter, and then the program checks for these elements as well as an element not in the filter (“orange”).

Conclusion

This Bash implementation of a Bloom Filter is a simple and efficient way to perform set membership tests with probabilistic guarantees. It’s particularly useful in applications where memory usage is a concern and false positives are acceptable.

Bloom Filter Implementation in Bash

Bloom Filter Implementation in Bash

Program Structure

Implementation

Explanation

Initialization

Hash Functions

Adding an Element

Checking for an Element

Example Usage

Conclusion

By Aditya Bhuyan

Leave a Reply Cancel reply

You Missed

Debugging Python with pdb: A Comprehensive Guide

How to Create Custom Exceptions in Python

Master Python List Slicing and Advanced Techniques

Mastering Python’s collections.deque for Efficient Data Manipulation

Featured Posts

Debugging Python with pdb: A Comprehensive Guide

How to Create Custom Exceptions in Python

Master Python List Slicing and Advanced Techniques

Mastering Python’s collections.deque for Efficient Data Manipulation

Mastering Python’s map, filter, and reduce Functions with Examples

Enjoy this blog? Please spread the word :)

Bloom Filter Implementation in Bash

Program Structure

Implementation

Explanation

Initialization

Hash Functions

Adding an Element

Checking for an Element

Example Usage

Conclusion

By Aditya Bhuyan

Related Post

Leave a Reply Cancel reply

You Missed

Enjoy this blog? Please spread the word :)