Infinite Horizontal Scalability with Redis Cluster

Welcome to this blog. If you are coming here directly, it’s highly recommended to read through this story first. We shall be looking at following topics in this blog :-

  • General Understanding of Scalability.

Question:- What’s the meaning of Scalability ?

Answer :- Scalability is the property of a system to handle a growing amount of work by adding resources to the system.

Question:- What are the scaling strategies available ?

Answer :- The two most common scaling strategies are :-

  • Vertical scaling.

Question:- What does Vertical-Scaling means ?

Answer:- Vertical scaling, or also called scaling up, means adding more resources like CPUs or memory to your server.

Question:- What does Horizontal-Scaling means ?

Answer:- Horizontal scaling, or scaling out, implies adding more servers to your pool of resources.

Note:- The difference between Vertical & Horizontal Scaling is just getting a bigger server & deploying a whole fleet of servers respectively.

Question:- Can you showcase a practical example for Vertical Vs Horizontal Scaling ?

Answer:- Suppose you have a server with 128 gigabytes of RAM, but you know that your database will need to store 300 gigabytes of data. In this case, you’ll have two choices :-

  • You can either add more RAM to your server, so it can fit the 300 gigabyte data set.

Question:- What are the usual practical reasons, for which we need Scaling?

Answer:- There are two usual reasons for doing the Scaling :-

  • Hitting your server’s RAM limit.

Question:- Why should we really go for Horizontal-Scaling (Sharding) and not for Vertical-Scaling ?

Answer:- Let’s see why we should prefer Horizontal-Scaling after a certain limit :-

  • Redis is mostly single-threaded, a single Redis Server instance cannot make use of the multiple cores of your server’s CPU for command processing.

This pattern of splitting data between multiple servers for the purpose of scaling is called Sharding.

Question:- Can you explain, what is a Shard ?

Answer:- Whenever we split the overall data between multiple Redis-Servers for the purpose of scaling, these resulting servers or processes that hold these chunks of the data are called shards.

Question:- Demonstrate the process of launching up of a Simple Redis Cluster.

Answer:- We shall be setting up following Redis Cluster :-

Step #1.) Let’s first launch a fresh Redis Instance with following configuration :-

# redis.conf fileport 7000
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes

Step #2.) Next, we shall launch 2nd Redis Instance now @ port no. 7001 :-

# redis.conf fileport 7001
cluster-enabled yes
cluster-config-file /Users/adityagoel/Downloads/redis-cluster/redis-7.0.0/7001/nodes.conf
cluster-node-timeout 5000
appendonly yes

Step #3.) Next, we shall launch 3rd Redis Instance now @ port no. 7002 :-

# redis.conf fileport 7002
cluster-enabled yes
cluster-config-file /Users/adityagoel/Downloads/redis-cluster/redis-7.0.0/7002/nodes.conf
cluster-node-timeout 5000
appendonly yes

Step #4.) Next, we shall launch 4th Redis Instance now @ port no. 7003 :-

# redis.conf fileport 7003
cluster-enabled yes
cluster-config-file /Users/adityagoel/Downloads/redis-cluster/redis-7.0.0/7003/nodes.conf
cluster-node-timeout 5000
appendonly yes

Step #5.) Next, we shall launch 5th Redis Instance now @ port no. 7004 :-

## redis.conf fileport 7004
cluster-enabled yes
cluster-config-file /Users/adityagoel/Downloads/redis-cluster/redis-7.0.0/7004/nodes.conf
cluster-node-timeout 5000
appendonly yes

Step #6.) Next, we shall launch 6th Redis Instance now @ port no. 7005 :-

#redis.conf fileport 7005
cluster-enabled yes
cluster-config-file /Users/adityagoel/Downloads/redis-cluster/redis-7.0.0/7005/nodes.conf
cluster-node-timeout 5000
appendonly yes

Step #7.) Next, we shall launch a Redis-Cluster with the help of afore-launched 6 Redis-nodes :- Here we list the ports and IP addresses of all six servers and use the CREATE command to instruct Redis to join them in a cluster, creating one replica for each primary. Redis-cli will propose a configuration; accept it by typing yes. The cluster will be configured and joined, which means, instances will be bootstrapped into talking with each other.

(base) adityagoel@Adityas-MacBook-Pro redis-7.0.0 % cd /Users/adityagoel/Downloads/redis-cluster/redis-7.0.0/(base) adityagoel@Adityas-MacBook-Pro redis-7.0.0 % ./src/redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 --cluster-replicas 1

Note: We shall see the concept of Hash-Slots very soon.

Question:- Can you kindly explain the meaning of configuration, that we used to launch Redis-Cluster ?

Answer:- We shall be setting up following Redis Cluster :-

# redis.conf file
port 7000
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
  • port 7000 → The first line we specify the port on which the server should run.

Question:- Now that, we have actually performed the setup of Redis-Cluster with multiple Redis-Shards, How will the Redis know, which particular Shard to look for a given key ?

Answer:- We need to have a way to consistently map a key to a specific shard. There are multiple ways to do this. The one Redis uses is called “Algorithmic Sharding”.

Question:- How does “Algorithmic Sharding” works ?

Answer:- We basically define the shard for a given key i.e. map the key to the shard. We hash the key and then mod the result by the total number of shards.

  • Because we’re using a deterministic hash function, this function will always assign a given key to the same shard always.

Question:- Now, say our business is growing and we have got more data to handle and therefore we want to increase our shards count (i.e. ReSharding). So, how does our data-querying is affected now ?

Answer:- Let’s say we add one new shard so that our total number of shards is three. Now, when a client tries to read the key foo, they will run the hash function and mod the number of shards as before. This time, the number of shards is THREE and therefore we’re modding with three instead of two. Understandably, the result may be different, pointing us to the different shard. For example, data for key “foo” was originally housed in Shard-0, but now post this fresh re-sharding, we are being re-directed to Shard-1. Thus, we may not get the data at all now.

Question:- How the Resharding issue can be solved manually with the Algorithmic — Sharding Strategy ?

Answer:- This can be solved by rehashing all the keys in the keys base and moving them to the shard appropriate to the new shard count. This is not a trivial task, though. And it can require a lot of time and resources, during which the database will not be able to reach its full performance or might even become unavailable.

Question:- Does there not exists, some smarter approach for solving the problem of ReSharding with Algorithmic Sharding ?

Answer:- Redis uses a clever approach to solve this problem, using HashSlot. Using hashslots allows the cluster to scale through the addition of new shards without having to compute new hashes for each key.

Part #1.) A logical unit that sits between a key and a shard called a hashslot.

Part #2.) The total number of hashslots in a database is always 16,384, or 16K.

Refer to our latest snapshot of Redis-Cluster Creation, where it also shown the similar message :-

Part #3.) The hashslots are divided roughly even across the shards. For example :- Say we have got overall 3 primary shards, then :-

  • Slots 0 through 5460 might be assigned to Primary-Shard-0.
(base) adityagoel@Adityas-MacBook-Pro redis-7.0.0 % cd /Users/adityagoel/Downloads/redis-cluster/redis-7.0.0/(base) adityagoel@Adityas-MacBook-Pro redis-7.0.0 % ./src/redis-cli -p 7000 cluster slots

Part #4.) In a Redis cluster, we actually mod by the number of hashslots, not by the number of shards. Each key is assigned to a particular hashslot. For example, In below diagram, though the 3rd Shard (i.e. Shard #2) was added recently, but since we are always hashing on the hashslot-count, therefore the answer of our mod, shall always remain same-hashslot. That is the smart solution provided by Redis-Cluster.

Question:- Demonstrate with an example, where say our we have got more data to handle and therefore we want to add another shard (i.e. we now wanted to perform ReSharding). So, how would HashSlots be moved to newly added Shard ?

Answer:- In Redis, data sharding (partitioning) is the technique to split all data across multiple Redis instances so that every instance will only contain a subset of the keys.

  • This process allows mitigating data grown by adding more and more instances and dividing the data to smaller parts (shards or partitions).

Step #1.) Now, say we wish to add another primary-shard to our Redis-Cluster. With this additional Shard, we would scale-up both Read & Write throughputs. Recall from our previous original demonstration that, we had got overall 3 primary shards and 1 replica for each.

Now, Let’s add a new master node to our Redis-Cluster :- We launch another Redis Instance and this shall run on port 7006 :-

Step #2.) Similarly, for this newly added master primary shard, we shall also have to have the slave-node, so let’s go ahead and also launch another Redis Instance (which would eventually become the replica) and this shall run on port 7007 :-

Step #3.) Let’s add this new Redis Instance running at port 7006 to our Redis Cluster, so as to make this part of cluster. Note that, after this step, we shall have 4 primary shards/nodes in our cluster :-

(base) adityagoel@Adityas-MacBook-Pro redis-7.0.0 % cd /Users/adityagoel/Downloads/redis-cluster/redis-7.0.0/(base) adityagoel@Adityas-MacBook-Pro redis-7.0.0 % ./src/redis-cli --cluster add-node 127.0.0.1:7006 127.0.0.1:7000

Note here in the command that :-

  • The first parameter is the address of the new shard (to be added).

Step #4.) Now, let’s investigate our Redis Cluster. We have got 4 primary shards/nodes in our Redis-Cluster as follows :-

  • Master Shard running on Redis Instance powered @ port 7000.

All Master Nodes have been highlighted with blue-color in below screenshot :-

Observe the below output that, every node in our Redis-Cluster have got some unique ID. For example, our newly added Redis Instance (powered @ port 7006) have got the ID as : 4b99501f01c953ab9273185c6f878255befb8839

(base) adityagoel@Adityas-MacBook-Pro redis-7.0.0 % ./src/redis-cli -p 7000 cluster nodes12d3c2c84682271220e7297589c2a3d674931ff5 127.0.0.1:7005@17005 slave 170746a71e7e69c88afd27fd6e95a2c1e98bae54 0 1655623286392 3 connectede04aee2f44b08f84e22ef3a0ed286e14fc44a50a 127.0.0.1:7001@17001 master - 0 1655623285379 2 connected 5461-1092201dffb9f96867fe75c4d3d247aa84973c4d77013 127.0.0.1:7004@17004 slave e04aee2f44b08f84e22ef3a0ed286e14fc44a50a 0 1655623286000 2 connected170746a71e7e69c88afd27fd6e95a2c1e98bae54 127.0.0.1:7002@17002 master - 0 1655623286696 3 connected 10923-16383bca6906f860abf8096d26391f10984b93bc35b6c 127.0.0.1:7000@17000 myself,master - 0 1655623285000 1 connected 0-5460fbc5f5e157d154cd41b606a9d88eb320c06ef9d4 127.0.0.1:7003@17003 slave bca6906f860abf8096d26391f10984b93bc35b6c 0 1655623286392 1 connected4b99501f01c953ab9273185c6f878255befb8839 127.0.0.1:7006@17006 master - 0 1655623286000 0 connected

Step #5.) Next, now let’s add our another Redis instance (running @ port 7007) to our Redis Cluster. Note that, this redis-instance shall be joining the cluster as a Slave Node :-

(base) adityagoel@Adityas-MacBook-Pro redis-7.0.0 % cd /Users/adityagoel/Downloads/redis-cluster/redis-7.0.0/adityagoel@Adityas-MacBook-Pro redis-7.0.0 % ./src/redis-cli -p 7000 --cluster add-node 127.0.0.1:7007 127.0.0.1:7000 --cluster-slave --cluster-master-id 4b99501f01c953ab9273185c6f878255befb883
  • We shall be using the same add-node command, and a few extra arguments indicating the shard is joining as a replica and what will be its primary shard. If we don’t specify a primary shard, Redis will assign one itself.

Step #6.) Now, let’s investigate our Redis Cluster again. We have got 4 primary shards/nodes in our Redis-Cluster as follows. All Master Nodes have been highlighted with blue-color in below screenshot :-

  • Master Shard running on Redis Instance powered @ port 7000.

We have also got 4 replica shards/nodes in our same Redis-Cluster :-

  • Replica Shard running on Redis Instance powered @ port 7003.
(base) adityagoel@Adityas-MacBook-Pro redis-7.0.0 % cd /Users/adityagoel/Downloads/redis-cluster/redis-7.0.0/(base) adityagoel@Adityas-MacBook-Pro redis-7.0.0 % ./src/redis-cli cluster nodes

Step #7.) Now our cluster has eight shards (four primary and four replica), but if we run the cluster slots command we’ll see that the newly added shard (i.e. Newly added primary shard running at port 7006) don’t host any hash slots yet and thus — No data at All. Finally, We would observe that, hash-slots are not at all assigned to the new primary-shard.

Step #8.) Let’s assign some hash slots to the newly added Shard (Recall that, newly added shard is running at port 7006 and it’s corresponding replica is running at port 7007) :-

(base) adityagoel@Adityas-MacBook-Pro redis-7.0.0 % cd /Users/adityagoel/Downloads/redis-cluster/redis-7.0.0/(base) adityagoel@Adityas-MacBook-Pro redis-7.0.0 % ./src/redis-cli -p 7000 --cluster reshard 127.0.0.1:7000
  • The first question you’ll get is about the number of slots you want to move. If we have 16384 slots in total, and four primary shards, let’s get a quarter of all shards, so the data is distributed equally. 16384 ÷ 4 is 4096, so let’s use that number.

Step #9.) Once the command finishes we can run the cluster slots command again and we’ll see that our new primary and replica shards have been assigned some hash slots:

  • Slots 1365 through 5460 (i.e. total of 4095 slots) have now been assigned to Primary-Shard-0, which is running @ port 7000.

Net total of 4094 hash-slots have been now successfully assigned to newly-added Primary-Shard-3 (this primary shard is running @ port 7006) :-

  • Slots 0 through 1364 i.e. total of 1365 slots.

Following is how the hash-slots allocation now looks like within our Redis Cluster :-

That’s how, hash-slot reallocation is usually done within Redis Cluster. Next, whenever we get any request to GET a particular key, since we are always hashing on the hashslot-count, therefore the answer of our mod, shall always remain same. Example :-

  • Before adding a new additional shard in our above demonstration, the number of hashslots were 16384. Now, say a client tries to read the key foo, they will run the hash function and mod with the number of hashslots. The result comes out to be 12,182.
  • Note that, hash-slot 12,182 is sitting on Shard #3. Now, we added additional shard, so that our total number of shards is FOUR now.
  • Note that, hash-slot 12,182 is now sitting on Shard #4, but we are just concerned about identifying the correct hashslot and we are done.

That’s all in this section. If you liked reading this blog, kindly do press on clap button multiple times, to indicate your appreciation. We would see you in next part of this series with Hands-On with Redis-Cluster.

References :-

--

--

Software Engineer for Big Data distributed systems

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store