Deep dive into Consistent Hashing
If you are landing on this page directly, it’s strongly recommended to first read through this link.
Question → What is the goal of Consistent Hashing ?
Answer → We want all objects to stay assigned to the same server, even as the number of server changes.
Question → What is the core insight of Consistent Hashing ?
Answer → In addition to hashing the objects like before, we also hash the server names.
- The Objects & The Servers are hashed with the same hashing-function.
- The hash-value of both of the above, would lie within a same range of values.
Question → Explain the Consistent Hashing with some example ?
Step #1.) We have range of x0 to xn. This range is called a Hash-Space.
Step #2.) Next, we connect both the ends of this hash-space, to form a Hash-Ring.
Step #3.) Using a Hashing-function, we hash each server by its name or IP and mapped the servers onto the ring based upon some angle. With our example earlier, we place our 4 servers onto the ring in below fashion :-
Step #4.) Next, we hash each object by it’s key, with same hashing-function.
- Note that, the range of hash-value would be in the same range.
- Unlike simple-hashing where we perform a modulo-operation on the hash, here we use the hash directly to map the object-key onto the ring.
Here is how the arrangement of servers and keys looks like on the ring :-
Question → How the Keys are allocated to the Servers in the Consistent Hashing world ?
Answer → To locate the server for a particular object-key, we can go :-
- Either Clockwise.
- OR Anti-Clockwise.
Let’s take the example of going in the clock-wise direction. We shall be :-
- Starting to move from the location of the object-key.
- And Moving onto the ring, until any Server is found.
With our logic, we end-up doing the following assignment :-
- Key K0 is assigned to Server S0.
- Key K1 is assigned to Server S1.
- Key K2 is assigned to Server S2.
- Key K3 is assigned to Server S3.
Question → Demonstrate the step by step process for addition of keys ?
Step 1: Setting Up the Hash Ring
Assume we have four servers in a distributed system:
- N1, N2, N3, and N4
Each server is assigned a position on the consistent hash ring using a hash function (e.g., SHA-1). The ring is a 0 to ²³² (if using a 32-bit hash) or 0 to ²¹⁶⁰ (for SHA-1).
Let’s assume the servers are assigned these positions based on their hashed values: ServerHash Position (Simplified) →
- N1 → 100
- N2 → 300
- N3 → 600
- N4 → 900
Step 2: Hashing the Key
Now, we need to insert a key-value pair (K1, V1) into the system.
- Compute hash(K1) using the same hashing function. Assume hash(K1) = 350
- Find the first node in a clockwise direction from 350 in the hash ring.
Step 3: Selecting the Primary Server
Looking at the ring:
- N1 (100) → N2 (300) → K1 (350) → N3 (600) → N4 (900)
- 350 is greater than N2 (300) but less than N3 (600)
- The first server clockwise from 350 is N3.
Thus, N3 is selected as the primary server for storing (K1, V1). ✅
Step 4: Adding More Keys
Let’s add another key K2:
- Compute hash(K2) = 750
- Find the first server clockwise from 750:
- N4 (900) is the first server after 750
- So, N4 stores (K2, V2).
Step 5: What Happens When a Node is Removed?
If N3 fails, the next available node (N4) will take over K1’s storage. This minimizes disruption.
Question → What Does “Ring is a 0 to ²³²” Mean in Consistent Hashing?
Answer → In consistent hashing, we imagine a hash space as a circular ring, where the values range from 0 to ²³² — 1 (if using a 32-bit hash function) or 0 to ²¹⁶⁰ — 1 (if using SHA-1, which produces a 160-bit hash). This defines the range of possible hash values.
1. Understanding the Hash Space as a Ring
A hash function (e.g., MD5, SHA-1, MurmurHash) converts keys (K) into numerical values within this range.
- If we use a 32-bit hash function (like MurmurHash), the maximum value is ²³² — 1 = 4,294,967,295.
- If we use SHA-1 (160-bit hash function), the maximum value is ²¹⁶⁰ — 1 ≈ 1.46 × 1⁰⁴⁸.
Since hash values wrap around when they reach the end of the range, we represent them in a circular structure (like a ring).
Why a Ring?
- Instead of treating the hash space as a linear range from 0 to ²³², we imagine it as a circular structure, where 0 follows ²³² — 1.
- This allows us to efficiently assign servers and distribute keys without major data movement when nodes are added or removed.
2. Placing Servers on the Hash Ring
Each server (node) is also assigned a position on this ring by hashing its unique identifier (e.g., IP address, hostname).
Example:
- Hash(N1) → 100
- Hash(N2) → 300
- Hash(N3) → 600
- Hash(N4) → 900
These servers are placed on the ring at their respective hash positions.
3. Mapping Keys to Servers → Each key (K) is also hashed to determine its position on the ring. Example:
- Hash(K1) → 350 → The first server clockwise after 350 is N3. So, N3 stores (K1, V1).
- Hash(K2) → 750 → The first server clockwise after 750 is N4. So, N4 stores (K2, V2).
4. Handling Node Additions and Failures
🔹 Adding a New Node
- Suppose we add N5 at hash = 400.
- Now, some keys originally assigned to N3 (600) will move to N5 (400).
- Only a small fraction of keys need to be reassigned, minimizing disruption.
🔹 Removing a Node (Failure)
- If N3 (600) fails, the next available node (N4 at 900) takes over its keys.
- This prevents massive data reshuffling, ensuring stability.
5. Why Not Use a Normal Hash Function Without a Ring?
If we used modulo-based hashing (hash(K) % N), adding or removing a server would change all key-to-server mappings. Consistent hashing ensures only a small portion of keys move, making it highly scalable and fault-tolerant.
Question → Demonstrate the process of adding a new server onto the Ring with Consistent Hashing and Simple Hashing process ?
Case #1.) Case of Consistent Hashing → Here, we have added a new 5th server i.e. S4 onto the ring.
- Earlier, Key K0 was allocated to the Server S0.
- In this new arrangement of servers, only key K0 needs to move from Server S0 to S4. This is because, in the clockwise direction, starting from the object-key K0’s location, the first server being encountered is S4.
- Other Keys K1, K2, K3 are not affected at all.
Therefore, With Consistent-Hashing, Adding a new server only requires redistribution of fraction of keys.
Case #2.) Case of Simple Hashing → With Simple-Hashing, whenever any new server is added, almost all the keys needs to be remapped.
Question → Demonstrate the process of removing a new server onto the Ring with Consistent Hashing ?
Answer → Here, we have removed a server S1 from the ring.
- Earlier, Key K1 was allocated to the Server S1.
- With the new arrangement (where S1 server has been removed), then only key K1 needs to move to S2. This is because, in the clockwise direction, starting from the object-key K1’s location, the first server being encountered is S2.
- Other Keys K0, K2, K3 are not affected at all.
Conclusions →
- We map both the servers and objects onto the same hash-ring, using a uniformly distributed hash-function.
- In order to locate the server for any object, we go clockwise on the ring from object’s location until a server is found.
Question → What are the issues with the Consistent Hashing ?
Answer → Following are some of the challenges with the afore-detailed approach :-
Challenge #1.) The distribution of the objects in the servers on the ring is likely to be un-even.
- Conceptually we pick “n” random points on the ring.
- We are highly unlikely to get a perfect partition of the ring into equally sized segments.
- For example :- Consider the below arrangement of servers & objects on the ring. Most of the objects shall be stored on the Server S2, as Server-S2 shall be nearest to many of the objects.
Challenge #2.) Even if the servers were originally evenly spaced on the ring, if the server S1 is removed, then the segment for server S2 shall be twice as large as the segment for server S0 and S3.
Question → How do we address this problem of Consistent Hashing ?
Answer → We use Virtual-Nodes to solve this problem. Here is how it’s done :-
1.) The idea is to have each server appear at multiple locations on the ring. Each location is a virtual node representing a server.
2.) For example, In the below example of HashRing, we have 2 servers with each having three virtual nodes :-
- For Server S0, it shall be represented by Virtual nodes : S0_1, S0_2 and S0_3.
- For Server S1, it shall be represented by→ Virtual nodes : S1_1, S1_2 and S1_3.
Thus, with virtual-servers in picture, here is how the diagram looks like :-
3.) With Virtual-Nodes, each server handles multiple segments on the ring. In our example above :-
- Segments labelled with S0, shall be handled by Server S0.
- Segments labelled with S1, shall be handled by Server S1.
Question → What is the count of Virtual-Nodes in the real world ?
Answer → In real-world, the number of virtual servers are usually much higher than 3.
- As the number of virtual-nodes increases, the distribution of objects becomes more balanced.
- Having more virtual-nodes means taking more space to store the metadata about the virtual-nodes.
This is a trade-off and we can tune the number of virtual-nodes to fit our system requirements.
Question → How Consistent-Hashing is used in real-world ?
Answer → This is how the consistent-hashing is used in real-world :-
- Usecase #1.) Some popular no-sql databases like Apache-Cassandra and Amazon-DynamoDB uses Consistent-Hashing, where it is used for Data-Partitioning. It helps these databases to minimise the data-movement during rebalancing.
- Usecase #2.) Content Delivery Network makes use of Consistent-Hashing, to help distribute web-contents equally among the edge-servers.
- Usecase #3.) Load-Balancers like Google-Load-Balancer uses Consistent-Hashing, to distribute persistent connections evenly across backend servers. This limits the number of connections, that needs to be re-established when a particular backend-server goes down.
That’s all in this blog. We shall see you in next document.