Optimising RDBMS performance with WAL

aditya goel
5 min readJan 21, 2024

--

If you are landing at this blog directly, it’s advisable to first check through this blog.

Question → Explain the structure of Hard-Disk & how the data is being stored in Database ?

Answer → Each Disk is circular in shape and have following properties :-

  • Disk has various Tracks & Sectors on it.
  • The combination of Track & Sector is also called as BLOCK. Data is usually kept in Blocks. It can be of any size, since it depends upon manufacturer.
  • Whenever we read/write into the Disk, we always do the same in terms of Blocks. Each Block is therefore addressed in terms of : (Track-Number, Sector-Number).
  • Assuming the size of each block to be 512, Inside each Block, addresses of each Byte starts from 0 and goes up-till 511. We also call it as OffSet. Each Byte is therefore addressed in terms of :: (Track-Number, Sector-Number, OffSet). These are the three crucial things, that we require to reach to a particular Byte.
  • The data on the disk is read by moving the HeadPost and spinning the spindle.

Question → Demonstrate the concept of Database-Block with an example ?

Answer → Let’s consider an example where each record is of 128 Bytes and assume that, each block size is of 512 Bytes, then we would be able to keep at-max 4 records (rows) in any given block.

Question → What are the usual Block-Sizes with the Database ?

Answer → Typically, the size of each block is as follows :-

1.) For PostGres, by default, the size of block is 8 KiloBytes.

2.) For MySql, default size is 16KB per block, although this can be decreased to 1KB or increased to 64KB.

3.) For Oracle, default block-size recommendations are 8 KB to 200 KB for optimal performance. The default block size is 8 KB.

  • Oracle recommends smaller block sizes for online transaction processing (OLTP) or mixed workload environments. For decision support system (DSS) workload environments, Oracle recommends larger block sizes.
  • Oracle has five block sizes (configurable): 2 KB, 4 KB, 8 KB, 16 KB, and 32 KB.
  • Smaller block sizes are good for small rows with lots of random access. However, small block sizes waste a relatively large amount of each block for metadata. Large block sizes waste storage and memory.

Question → What’s the importance of Commit in the database ?

Answer → Once a transaction is committed on the Hard-disk, it shall be safe from power-loss, OS-failure OR Hardware-Failure as well.

Question → What’s the importance of Commit in the database ?

Answer → Once a transaction is committed on the Hard-disk, it shall be safe from power-loss, OS-failure OR Hardware-Failure as well.

Question → What happens whenever say any particular row is being updated ?

Answer → Once the row is updated, the corresponding block in which this row is present shall be updated.

Question → What happens whenever say any particular row is being updated by the user ?

Answer → Once the row is updated, the corresponding block in which this row is present shall be updated.

The place/file where these changes are being logged is called as WAL i.e. Write-Ahead-Logging.

This WAL is then synched to the Database periodically.

Question → What are the Advantages of WAL ?

Advantages → WAL is one way of improvising the performance of the database in a massive way. Note the following aspects :-

  • Only update-statements are being appended into this Log-File and then periodically, these changes are being sent to the Table.
  • Basically the motivation is NOT to write every transaction/update to the database synchronously but rather buffer these changes In-Memory (let’s say every one second) and then flush it onto the Disk.
  • Here we are slashing/reducing a lot of disk-writes, but we are not running away from the Guarantees that RDBMS have given to the user.
  • Typically we gain an immense set of performance by avoiding a lot of disk-writes.

For example :-

1.) With MySql → The innodb_flush_log_at_timeout variable controls the frequency of InnoDB log file flushes into the MySQL. The frequency can range from 1 to 2700 seconds, with the default value being 1 second.

2.) With PostgreSQL → The term flush is often used when the Write Ahead Logs (WAL) are written to their final location. The WAL buffers can then be reused for incoming transactions.

Question → How does WAL ensure the Data-Integrity ?

Answer → Below is how the WAL ensures the Consistency of the data as that promised by RDBMS :-

  • Whenever any particular update is being written to the WAL, first the HASH for that row is being computed using CRC-32 algorithm (i.e. Cyclic Redundancy Check).
  • This Hash is first written to the WAL file and then contents are being written there in WAL file.
  • Now, while reading the contents from WAL, we can use this hash to check if the record is correct & it has not been messed.

Question → How is this WAL structured ?

Answer → Below is how the WAL files are structured :-

  • WAL is nothing, but it is composition of some files, where each file is of size 16 MB.
  • Each File consists of Pages, where each Page is of size 8 KB. Within this page, we add our “Write-Ahead-Log-Entry”.
  • As soon as we write something into this WAL, it gives an identifier to this entry. This identifier is known as “Log Sequence Number”. This LSN is Byte-Offset into the Log File.

That’s all in this blog. We shall see you in next blog.

References :-

--

--

aditya goel
aditya goel

Written by aditya goel

Software Engineer for Big Data distributed systems

No responses yet