Deep dive into Google BigTable | Part-1

6 min readJun 25, 2023

In case you are landing here directly, it’s strongly suggested that you go and read through this for fundamentals.

In this blog, we shall be looking at following concepts :-

Introduction to BigTable.

Question → What is Bigtable ?

Answer → It’s another NoSQL kind of database and it’s a fully managed system inside the Google Cloud Platform. It’s a wide column, NoSQL database system.

Question → How is Bigtable different from RDBMS ?

Answer → From the RDBMS point of view or from Data store, file store, they are not a Column-Oriented. They’re all mostly Row-Oriented, Entity-Oriented, Record-Oriented.

Question → Is Bigtable a Serverless Database ? How do we create BT ?

Answer → BigTable is not a serverless service. That means, we need to create cluster for this server ourselves. Minimum number of nodes are 3.

Question → How do we scale the Bigtable ?

Answer → BigTable can be scaled horizontally with multiple nodes. So that’s one of the cool thing about this Bigtable.

We can scale to the huge volumes of data linearly with the number of nodes in the cluster.
Theoretically, it’s unlimited amount of data we can store, because it’s a horizontally scalable. That’s from Google’s documentation.
Nodes are custodian of data. Every node can scale upto roughly 10,000 of QPS.

If we increase the nodes count, say for example if number of nodes are 300, then we can handle 3 Million QPS easily. Note about Real Scale Numbers : In 2016, the BigTable scaled to QPS of 25 Million.

Question → How is Bigtable compared to Google-Cloud-Spanner ?

Answer → Google Cloud Spanner is yet another database that is also horizontally scalable, but that will be inside the RDBMS category.

Question → How is data organised with Bigtable ?

Answer → All of our columns are grouped into a column family. So, that’s little new concept compared to columns in a RDBMS system.

Question → How about Latency while working with Bigtable ?

Answer → It has a just a single-millisecond latency.

The P99 latency i around 6 milliseconds at PetaBytes scale of data.
So, whenever you require a very low latency in a NoSQL space, you should use the cloud Bigtable.
It can literally handle millions of requests per second.

Question → What all other projects across world are inspired by Bigtable ?

Answer → Here is how the family tree of Google looks like :-

Question → How does the evolution timeline for Bigtable looks like ?

Question → What are the other product lines, that google launched over the time ?

Question → What are the the use-cases of Google BigTable ?

Answer → Any place, where there are millions of rows being ingested would be best usecase for BT. Mainly this cloud Bigtable will be used for :-

Keeping financial data.
Keeping the TimeSeries data.
Other options looks like :-

It is very good for storing such a kind of data and retrieving data.

Question → Are there any such databases, which levarages the Google BigTable under their hoods ?

Question → At what scale should we think about using cloud Bigtable?

Answer → If the entire database sits under 50GB, then RDBMS should be the preferred option.

Question → How can we access the Google’s Bigtable?

Answer → There are two ways of accessing the Google Cloud BigTable :

We can use the Command-Line-Utility called as CBT. This is also the part of the cloud-SDK. So, if cloud SDK is installed in your local machine, you can easily use this CBT.

We can use the HBase-APIs, because this cloud Bigtable is highly compatible with the HBase.

Question → How does the cluster looks like for a BigTable usually ?

Question → How does the Clients connects to the BigTable ?

Question → How does the BigTable stores the data under the hood ?

Question → How the data is stored inside the Google Cloud Bigtable ?

Answer → Prima-facie, it looks like that it’s a relational database management system. There is the first row, there’s the second row, there’s a third row and so on…

Question → Google Cloud Bigtable looks like RDBMS. Is it really a NoSQL ?

Answer → Cloud BigTable is a NoSQL kind of system. For every single record, it has just the one single row-key to access it.

Note that, we just cannot create indexing on other column, like name OR salary OR designation OR company. This is simply not at all possible.
Note that V. V. Imp, point, we can just simply retrieve data based on the row-key only and can never retrieve the data on any other column. There is only one Index in any given table in BT.

Question → Why Cloud BigTable is called as Wide-Column-Database ?

Answer → With Cloud-BigTable :-

Definitely every single record may have hundred or even thousands of different column.
Google says that, it can go to millions of column also. That’s why it’s called as a Wide-Column-Database.

Question → How to organise the data with Cloud BigTable ?

Answer → With Cloud-BigTable, there is another concept like column-family. Example →

The columns : “name” and “age” have been grouped together into one column family called as personal_date_cf.
The columns : “Salary”, “designation”, “company” have been grouped together into one column family called as Professional_data_cf.

Question → What’s the DataModel of Cloud BigTable ?

Question → How does a particular Cell in Cloud BigTable looks like ?

Question → How the data is stored across multiple nodes in a cluster ?

Answer → The data is being stored into multiple nodes of a cluster.

Question → How can we achieve Durability and High Availability with BigTable ?

Answer → We can use Cross-Zone OR Cross-Region replication, in order to achieve the High Availability and Durability for our data :-

Question → Is there any other benefit of Geographical Replication as well ?

Answer → Yes, it can help to improve the latency as well, by putting data more closer to our customers.

That’s all in this section. If you liked reading this blog, kindly do press on clap button multiple times, to indicate your appreciation. We would see you in next blog.

References :-

Deep dive into Google BigTable | Part-1

Written by aditya goel