Bloom Filter Algorithm for Search

8 min readJan 29, 2023

Question:- What is a Bloom Filter ?

Answer → A Bloom-Filter is a Space-Efficient probabilistic Data-Structure. It’s been around for 50 years. It’s used to answer the question like : Is this element in the Set ?

Question:- What are the practical applications of Bloom Filter ?

Answer → A Bloom Filter is a Data-Structure with many practical applications. It can be found in Browsers, Network-Routers and Databases just to name a few.

Question:- What can be the practical application, where Bloom Filters can be used ?

Answer → A Bloom Filter is used to answer this question : If this Element is present in the Set ? The Bloom-Filter would answer with either a “Firm No” or “Probably Yes”. This “Probably Yes” part makes the Bloom-Filters probabilistic.

The False-Positives are possible i.e. The element is NOT present in the in Set, but Bloom Filter says, it is present indeed.
The False-Negatives are Not possible i.e. The element is present in the Set, but Bloom Filter says, it is NOT present.

Question:- What’s the TradeOff here with Bloom Filters ?

Answer → In exchange of providing sometimes an Incorrect False-Positive Answer, a Bloom-Filter consumes lot less memory than a Data-Structure like a Hash-Table, that would provide correct answer all the times.

Question:- What’s that thing, that we must take care while using the Bloom Filters ?

Answer → If our use-case can tolerate Some False-Positives and can’t tolerate the False-Negatives, then we can go for the Bloom-Filters.

Question:- How does Bloom Filters works ?

Answer → The critical ingredient to a Bloom-Filter is some good hash-functions.

These Hash-functions should be fast and they should produce outputs that are evenly and randomly distributed.
Collisions are OK, as soon as they are rare.

Understanding → A Bloom-Filter is a large set of buckets, with each bucket containing a single bit and they all start with a Zero. Let’s imagine that, we want to keep track of the food I like. For this example :-

Step #1.) We would start with a Bloom Filter of size 10, labelled from 0 to 9 :-

Step #2.) Now, for every incoming element :-

We shall be passing it through THREE distinct Hash-Functions.
Each Hash-Function would end-up returning a value from range 0 to 9.

For example, we want to put the element “Ribeye” a type of meat, into the Bloom-Filter. So, upon passing this element through the THREE Hash-Functions :-

Imagine that upon passing the element “Ribeye” through Hash-Function-1, we got the value as 1. This means that, the value at Index 1 would be marked as 1.
Imagine that upon passing the element “Ribeye” through Hash-Function-2, we got the value as 3. This means that, the value at Index 3 would be marked as 1.
Imagine that upon passing the element “Ribeye” through Hash-Function-3, we got the value as 4. This means that, the value at Index 4 would be marked as 1.

Step #3.) Let’s take another example, we want to put the element “Potato” into the Bloom-Filter. So, upon passing this element through the THREE Hash-Functions :-

Imagine that upon passing the element “Potato” through Hash-Function-1, we got the value as 0. This means that, the value at Index 0 would be marked as 1.
Imagine that upon passing the element “Potato” through Hash-Function-2, we got the value as 4. This means that, the value at Index 4 would be marked as 1.
Imagine that upon passing the element “Potato” through Hash-Function-3, we got the value as 8. This means that, the value at Index 8 would be marked as 1.

Test #1.) Now, let’s test our Bloom-Filter, by inquiring from it, whether I liked the “Potato” ?

Since the same input always hashes to the same output, through the same Hash-Functions, element “Potato” still hashes to values : [0, 4, 8].
Next, we check all those buckets i.e. we check whether the value at the Indices [0, 4, 8] are 1 or not ?

Upon checking all these indices with the Bloom-Filter, since all of these indices are ONE, therefore Bloom-Filter declares that : Yes, I like the element “Potato”.

Test #2.) Now, let’s test our Bloom-Filter, by inquiring from it, whether I liked the “Pork Chop” ?

As a first thing, we pass the element “Pork Chop” through the THREE Hash-Functions, say that three hash-functions returns the values : [1, 5, 8]. Note that, these values can be anything else as well.
Next, we check all those buckets i.e. we check whether the value at the Indices [1, 5, 8] are 1 or not ?

Upon checking all these indices with the Bloom-Filter, since all of these indices are NOT ONE, therefore Bloom-Filter declares confidently that : NO, I don’t like the element “Pork Chop”.

Test #3 → Case of False-Positive) Now, let’s test our Bloom-Filter, by inquiring from it, whether I liked the “Lemon” ? As a first thing, we pass the element “Lemon” through the THREE Hash-Functions, say that three hash-functions returns the values : [1, 4, 8]. Next, we check all those buckets i.e. we check whether the value at the Indices [1, 4, 8] are 1 or not :-

Recall that, The value at Index 1, was set as ONE, when “ribeye” element was ingested into our Bloom-Filter.
The value at Index 4, was set as ONE, when elements “Potato” and “ribeye” were ingested into our Bloom-Filter.
The value at Index 8, was set as ONE, when “Potato” element was ingested into our Bloom-Filter.

Upon checking all these indices with the Bloom-Filter, since all of these indices are already ONE (being set 1 by some other elements), therefore Bloom-Filter still ends-up declaring that : Yes, I like the element “Lemon”. This is the case of False-Positive. Here, The element(“Lemon”) is NOT present in the Set, but Bloom Filter still says, it is present indeed.

Question:- What’s the size of the Bloom Filters in real-life ?

Answer → The size of the Bloom Filters in practical life is very big, as compared to the size being shown here in the example above.

Question:- Explain how Akamai’s CDN uses Bloom Filters ?

Answer → Akamai’s CDN uses Bloom-Filters to prevent Caching “One Hit Wonders”.

Question:- What is One Hit Wonders ?

Answer → These are web-pages, which are only requested once. According to the Akamai, 75% of the pages are One Hit Wonders.

Question:- How does Akamai prevent caching of One Hit Wonders ?

Answer → Bloom-Filter basically tracks all the URLs seen.

For every request (say the web-page is being viewed for the first time), it ends-up ingesting an entry into the Bloom-Filter.
Basically, the process is : The Webpage is ran through multiple Hash-Functions and the index-values being returned by these multiple hash-functions are set to ONE.
For every subsequent request, it checks whether this entry is present into the Bloom-Filter. Only, if it’s present, then only the web-page is Cached otherwise Not.

Question:- What’s the advantage of preventing the caching of One Hit Wonders ?

Answer → Preventing the caching of One Hit Wonders, significantly reduces the Caching Workload at Akamai’s end and increases the Cache-Hit-Rate as well.

Question:- Explain how Modern Browsers uses Bloom Filters ?

Answer → Web Browsers like Google-Chrome used to use Bloom-Filters :-

To identify, whether a particular URL is malicious OR NOT ?
Only when Bloom-Filter declares that, an URL is probably malicious, then only it performs the fully blown check of verifying whether the URL is really malicious or not ?

Question:- Does Browsers still uses Bloom Filters, even today ?

Answer → Since there are millions of malicious URLs possible, we need a more efficient solution than this.

Question:- Explain how Password Validators uses Bloom Filters ?

Answer → Some Password-Validators uses Bloom-Filters to prevent users from using weak passwords. This is how it would work :-

All the possible weak passwords would be first indexed into the Bloom-Filter.
Whenever any User intends to create the password, that same password is first checked into the Bloom-Filter.

==> If that password is well present into the Bloom-Filter, simply we can ask the user to come up with a new password.

==> If that password is NOT present into the Bloom-Filter, it means that it’s indeed not a weak password and user may proceed ahead with it.

Note that, at times, even some of the strong passwords might become the victim of False-Positive scenario and a Strong password might be called out as a Weak password.

Question:- Explain how does NoSql databases uses Bloom Filters ?

Answer → Many NoSql-databases uses Bloom-Filters to reduce the disk reads for the keys, that doesn’t exist. With LSM-Tree based database, searching for a key that doesn’t exists requires looking through many files and is very costly operation.

That’s all in this blog. We shall see you in the next document..

Bloom Filter Algorithm for Search

Written by aditya goel

No responses yet