Deep dive into Google BigTable | Part-2

aditya goel
8 min readJan 14, 2024

Question → How about the Indexing with cloud Bigtable?

Answer → In case of Google Cloud Bigtable :-

  • We just cannot create index for multiple column.
  • Google Cloud BigTable can just have one row-key based indexing.

Question → What is the importance of Row-Key in Cloud BigTable ?

Answer → The row-key is a very important concept.

  • Because while retrieving data, we are just dependent upon a row-key.
  • Note that V. V. Imp, point, we can just simply retrieve data based on the row-key only and can never retrieve the data on any other column.

Question → What are the crucial things to keep in mind, while designing the row-key ?

Answer → Following are the designs which should be considered while designing a row-key :-

  • It should be based on the query that we are gonna use in our application.
  • The base-rule while designing BigTable row-key is that : Don’t ever use monotonically increasing/decreasing keys, otherwise this might lead to problem of HotSpotting.

Question → Explain the problem of HotSpotting ?

Answer → We would understand the problem of Hotspotting with this example :-

  • Let’s say that, we have 10 different instances powering to Google Cloud Bigtable.
  • Then, due to design of your row-key, all load of your traffic will land up to one single instance of your cloud Bigtable and all other instances would just be sitting idle.

The problem here is that, We are not fully utilising the power of other-machines and that’s where this hotspotting issue will happen. Remember, the Nodes in the cluster are custodian of the keys.

Question → How do we do Schema designing for BigTable ?

Question → Showcase an example of designing Schema with BigTable ?

Answer → Imagine that, we are designing the schema of BigTable for the use-case of Observability System like NewRelic.

  • One of the techniques that we have used here is called as “Field-Promotion.
  • Another technique that we can use is Client-Side-Hashing along with some salts.

Question → Demonstrate the concept of Wide Tables with BT ?

Answer → Usually, it is advisable to go for wide columnar design. Below we have kept the design of “Followers” in Twitter OR similar system.

Question → Demonstrate the process of creating a BigTable Instance ?

Answer → Here is the process for creating the Table with Cloud BigTable :-

Step #1.) Let’s log-in to our GCP account :- Here we have selected the project as : “gcp-pde”.

Step #2.) Next, we go and select the BigTable :-

Step #3.) First thing is that, we shall have to create a Instance with BigTable :-

Step #4.) Next, we keep a name for this Instance, which shall exclusively power to our BigTable. Note that, the instance-id is automatically assigned.

Step #5.) Next, we choose the disk-type to be associated with this instance :-

5.1) If we choose SSD as type of storage, then cost shall be USD 0.17 per GB.

5.2) Else If we choose HDD as type of storage, then cost shall be USD 0.03 per GB. This disk is little slow and our latency shall be high.

Step #6.) Next, we create a cluster for this BigTable. We select the Region and Zone into which this table has to be created, along with the number of nodes.

Step #7.) Note that, as we selected the number of nodes as TWO, the cost for the BigTable rises sharply.

Step #8.) Next, we reduce the size of storage from 100 GB to 10 GB and cost can be reduced little :-

Step #9.) Finally, we hit the CREATE button and our first BT instance is created :-

Instance for BT has been created.

Question → Demonstrate the process of creating a Table now inside this BigTable Cluster ?

Answer → First, let’s activate the CloudShell in order to access the BigTable.

Step #1.) We first need to have a .cbtrc file, which contains 2 important informations :-

  • project
  • instance

Step #2.) Let’s create a table with name as “emp” inside the Google BT Cluster :-

cbt createtable emp

Step #3.) Let’s now edit this table and add one column-family from UI :-

column-family : personal_data_cf added to this Table.

Note that :-

  • In BT, for every row, it’s multiple versions can be maintained.
  • And Good news is that, we can also specify the Garbage Collection Policy as well for these versions, depending upon our requirement.

Step #4.) Let’s now verify from CloudShell tool cbt, whether column-family has been added :-

Command to see the table-schema : cbt ls emp

Step #5.) Let’s now add another column-family, this time from CloudShell tool cbt :-

Command to create a nee column-family in BT : cbt createfamily emp CF_NAME

Question → Demonstrate the process of adding some data to the BigTable ?

Answer → Below is how we can add some data to the BigTable :-

Step #1.) Let’s now add a new column in one of the existing column-families (personal_data_cf) that we have created into our BT Table, from CloudShell tool cbt :-

cbt set emp ROW_KEY COLUMN-FAMILY-NAME:COLUMN-NAME=VALUE

Note that :-

  • So far we didn’t created the columns in the BT Table explicitly.
  • We are creating the column (“name” and “age”)on the fly i.e. at the time of assigning a value itself, we are creating a column.
  • In order to read the entire BT Table, we are using this command : cbt read emp
  • The row-key being used here in afore-mentioned command is “J1”.

Step #2.) Next, let’s now add a new column in another existing column-families (professional_data_cf) that we have created into our BT Table, from CloudShell tool cbt :-

cbt set emp J1 professional_data_cf:salary=32000

Note that :-

  • Here again, we are creating the column(“salary”) on the fly i.e. at the time of assigning a value itself, we are creating a column.
  • The row-key being used here in afore-mentioned command is “J1”.
  • In order to read the entire BT Table, we are using this command : cbt read emp

Step #3.) Again, let’s repeat the above process once more and add yet another new column in another existing column-families (professional_data_cf) that we have created into our BT Table, from CloudShell tool cbt :-

cbt set emp A1 professional_data_cf:education=master

From the output of above command, we can clearly see that, for this given row-key “J1”, there are three columns now :-

  • Two columns (“age” and “name” ) have been grouped together under the column-family : “personal_data_cf”.
  • One column “salary” has been put under the column-family “professional_data_cf”.

From the output of above command, we can clearly see that, for this given row-key “A1”, there exists one column now : “education” under the column-family “professional_data_cf”.

Question → How would you see the number of records present into a BigTable ?

Answer → We can use below command to see the number of records for a given BigTable :-

cbt count emp

Question → Demonstrate the process of updating the value for some column in an existing BigTable ?

Answer → Let’s now update the value for the column “salary” for the aforementioned row-key “J1” :-

cbt set emp J1 professional_data_cf:salary=35000

Note here that :-

  • We have updated the value of “salary” column for this row-key (J1) from 32000 to 35000.
  • Note that, in BigTable, even the earlier value would also be preserved, depending upon, what value we had set for garbage-collection.
  • We can see that, even the earlier-value (32000)also exists as well as new-value (35000) also exists.

Question → What if we update the value for some column once again ?

Answer → Let’s now update the value for the column “salary” for the aforementioned row-key “J1” once more :-

cbt set emp J1 professional_data_cf:salary=45000

Note here that :-

  • We have updated the value of “salary” column for this row-key (J1) from 35000 to 45000.
  • Note that, in BigTable, even the earlier values shall also be preserved, depending upon, what value we had set for garbage-collection. In this case, we had set the value to Unlimited i.e. Infinite.
  • We can see that, even the earlier-values (32000 & 35000)also exists as well as new-value (45000) also exists.

References :-

--

--

aditya goel

Software Engineer for Big Data distributed systems