Indexing in Relational-databases

Question. #1) Explain the structure of Hard-Disk ?

Answer.) Each Disk is circular in shape and have following properties :-

  • Disk has various Tracks & Sectors on it.

Question. #2) Explain cost to search a record in Database ?

Answer.) Imagine we have below schema of Employee Table in RDBMS. Following are its fields :-

  • EID → Imagine, the space taken by this field is 10 Bytes.

Therefore, overall space taken by each record is : 128 Bytes. Now, also imagine that : Size of each block is 512 Bytes and we have got total 100 Records with us. In this case, following calculation would hold good :-

  • Blocking-Factor (Total no. of Records, that can be adjusted in any one Block) = => 512 / 128 == 4.

Some important concepts now :-

==> In order to save the Employee table completely, we would require total of 25 Blocks on the Hard-Disk, where each block is of 512 Bytes in Size.

==> Imagine when we want to search for a record in this table, total-time to search a record, depends upon the total no. of blocks that we would need to access for reaching to that record.

==> So, we have to at-most access 25 Blocks, in order to search for any particular record.

Question. #3) Explain the concept of Primary-Indexing in RDBMS ?

Answer :) Here, we have following aspects :-

  • The most fundamental assumption here is that, data in our main database is actually sorted. Imagine the ID column of our Employee table.

Question. #3) Demonstrate Primary-Indexing in RDBMS with example ?

Answer :) Here, we have following aspects → Imagine the size of each row is 100 Bytes and in-total we have around 30,000 rows. Also, imagine that size of each block is 1024 Bytes. Then, In this case, following would be our computations :-

  • Blocking-Factor ==> 1024/100 ==> 10 (i.e. Max of 10 records can be fitted in one Block).

Now, imagine that we have added a Primary-Index. Since, we are not going to store all of our entries into the Index, we only store the entries count equal to “The count of Blocks acquired by the DB-Table” into our Index. Also imagine that, size of a single row in the Index-File is around 15 Bytes. Therefore, following would be our computations :-

  • Blocking-Factor ==> 1024/15 ==> 68 (i.e. Max of 68 records can be fitted in one Block).

Question. #4) Demonstrate Secondary-Indexing in RDBMS with example ?

Answer :) Here, we have following aspects :-

  • The most fundamental assumption here is that, data in our main database is NOT sorted. Imagine the Name column of our Employee table.

Question. #5) Demonstrate Secondary-Indexing in RDBMS with example ?

Answer :) Here, we have following aspects → Imagine the size of each row is 100 Bytes and in-total we have around 30,000 rows. Also, imagine that size of each block is 1024 Bytes. Then, In this case, following would be our computations :-

  • Blocking-Factor ==> 1024/100 ==> 10 (i.e. Max of 10 records can be fitted in one Block).

Now, imagine that we have added a Secondary-Index. Since, we would have to store all of our entries into the Index, because our data is not sorted in the main table. Now, imagine that, size of a single row in the Index-File is around 15 Bytes. Therefore, following would be our computations :-

  • Blocking-Factor ==> 1024/15 ==> 68 (i.e. Max of 68 records can be fitted in one Block).

Question. #6) Compare the performance in terms of efforts required to perform search on a database, when we don’t have Index at all and when we have Indexes.

Step #1.) Imagine we have below schema and data with us. Here is Student Table in RDBMS :-

  • There are total 20,000 Rows available with us. There are total 3 Columns available with us for each row. Let us imagine, size of any single row is approx. 150 Bytes.

Step #2.) Then, we can calculate following aspects :-

  • Blocking-Factor (Total no. of Records, that can be adjusted in any one Block) = => 2048 / 150 == 13. There might be some Internal Fragmentation in each block, since we can’t accomodate one record in 98 Bytes of space.
  • Searching (without Indexing) on column, which contains sorted data → We have also learnt just in above question that, we always read the data in terms of Blocks. Now, In case that our data (to be searched) is already sorted (i.e. say search is done on the ID, which is sorted already), then we can perform the Binary-Search on these Blocks And therefore maximum number of blocks that we would have to access are :- Log(1539) = 11.

Step #3.) Next, now apply the Primary-Index on our table and a index-file shall be stored again on the Disk. Imagine that, each entry in the Index-file is around 20 Bytes.

  • Blocking-Factor ==> 2048/20 ==> 102 (i.e. Max of 102 records can be fitted in one Block).

Step #4.) Next, now apply the Secondary-Index on our table and a new index-file shall be stored again on the Disk. Imagine that, each entry in the Index-file is around 20 Bytes again.

  • Blocking-Factor ==> 2048/20 ==> 102 (i.e. Max of 102 records can be fitted in one Block).

That’s all in this blog. We shall see you next in this series. Please clap on this page, if you liked reading this.

--

--

Software Engineer for Big Data distributed systems

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store