Bloom Filter Algorithm for Search

Question:- What is a Bloom Filter ?

Answer → A Bloom-Filter is a Space-Efficient probabilistic Data-Structure. It’s been around for 50 years. It’s used to answer the question like : Is this element in the Set ?

Question:- What are the practical applications of Bloom Filter ?

Answer → A Bloom Filter is a Data-Structure with many practical applications. It can be found in Browsers, Network-Routers and Databases just to name a few.

Question:- What can be the practical application, where Bloom Filters can be used ?

Answer → A Bloom Filter is used to answer this question : If this Element is present in the Set ? The Bloom-Filter would answer with either a “Firm No” or “Probably Yes”. This “Probably Yes” part makes the Bloom-Filters probabilistic.

Question:- What’s the TradeOff here with Bloom Filters ?

Answer → In exchange of providing sometimes an Incorrect False-Positive Answer, a Bloom-Filter consumes lot less memory than a Data-Structure like a Hash-Table, that would provide correct answer all the times.

Question:- What’s that thing, that we must take care while using the Bloom Filters ?

Answer → If our use-case can tolerate Some False-Positives and can’t tolerate the False-Negatives, then we can go for the Bloom-Filters.

Question:- How does Bloom Filters works ?

Answer → The critical ingredient to a Bloom-Filter is some good hash-functions.

Understanding → A Bloom-Filter is a large set of buckets, with each bucket containing a single bit and they all start with a Zero. Let’s imagine that, we want to keep track of the food I like. For this example :-

Step #1.) We would start with a Bloom Filter of size 10, labelled from 0 to 9 :-

Step #2.) Now, for every incoming element :-

For example, we want to put the element “Ribeye” a type of meat, into the Bloom-Filter. So, upon passing this element through the THREE Hash-Functions :-

Step #3.) Let’s take another example, we want to put the element “Potato” into the Bloom-Filter. So, upon passing this element through the THREE Hash-Functions :-

Test #1.) Now, let’s test our Bloom-Filter, by inquiring from it, whether I liked the “Potato” ?

Upon checking all these indices with the Bloom-Filter, since all of these indices are ONE, therefore Bloom-Filter declares that : Yes, I like the element “Potato”.

Test #2.) Now, let’s test our Bloom-Filter, by inquiring from it, whether I liked the “Pork Chop” ?

Upon checking all these indices with the Bloom-Filter, since all of these indices are NOT ONE, therefore Bloom-Filter declares confidently that : NO, I don’t like the element “Pork Chop”.

Test #3 → Case of False-Positive) Now, let’s test our Bloom-Filter, by inquiring from it, whether I liked the “Lemon” ? As a first thing, we pass the element “Lemon” through the THREE Hash-Functions, say that three hash-functions returns the values : [1, 4, 8]. Next, we check all those buckets i.e. we check whether the value at the Indices [1, 4, 8] are 1 or not :-

Upon checking all these indices with the Bloom-Filter, since all of these indices are already ONE (being set 1 by some other elements), therefore Bloom-Filter still ends-up declaring that : Yes, I like the element “Lemon”. This is the case of False-Positive. Here, The element(“Lemon”) is NOT present in the Set, but Bloom Filter still says, it is present indeed.

Question:- What’s the size of the Bloom Filters in real-life ?

Answer → The size of the Bloom Filters in practical life is very big, as compared to the size being shown here in the example above.

Question:- Explain how Akamai’s CDN uses Bloom Filters ?

Answer → Akamai’s CDN uses Bloom-Filters to prevent Caching “One Hit Wonders”.

Question:- What is One Hit Wonders ?

Answer → These are web-pages, which are only requested once. According to the Akamai, 75% of the pages are One Hit Wonders.

Question:- How does Akamai prevent caching of One Hit Wonders ?

Answer → Bloom-Filter basically tracks all the URLs seen.

Question:- What’s the advantage of preventing the caching of One Hit Wonders ?

Answer → Preventing the caching of One Hit Wonders, significantly reduces the Caching Workload at Akamai’s end and increases the Cache-Hit-Rate as well.

Question:- Explain how Modern Browsers uses Bloom Filters ?

Answer → Web Browsers like Google-Chrome used to use Bloom-Filters :-

Question:- Does Browsers still uses Bloom Filters, even today ?

Answer → Since there are millions of malicious URLs possible, we need a more efficient solution than this.

Question:- Explain how Password Validators uses Bloom Filters ?

Answer → Some Password-Validators uses Bloom-Filters to prevent users from using weak passwords. This is how it would work :-

==> If that password is well present into the Bloom-Filter, simply we can ask the user to come up with a new password.

==> If that password is NOT present into the Bloom-Filter, it means that it’s indeed not a weak password and user may proceed ahead with it.

Note that, at times, even some of the strong passwords might become the victim of False-Positive scenario and a Strong password might be called out as a Weak password.

Question:- Explain how does NoSql databases uses Bloom Filters ?

Answer → Many NoSql-databases uses Bloom-Filters to reduce the disk reads for the keys, that doesn’t exist. With LSM-Tree based database, searching for a key that doesn’t exists requires looking through many files and is very costly operation.

That’s all in this blog. We shall see you in the next document..

--

--

Software Engineer for Big Data distributed systems

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store