High Availability with Redis Cluster
Welcome to this blog. If you are coming here directly, it’s highly recommended to read through this story first. We shall be looking at following topics in this blog :-
- High-Availability with Redis Cluster.
- Automatic Failover with Redis Cluster.
- Split-Brain Problem with Redis Cluster and its solution.
Question:- How does Redis-Cluster provides High-Availability ?
Answer:- High availability refers to the cluster’s ability to remain operational even in the face of certain failures. For example, the cluster can detect when a primary shard fails and promote a replica to a primary without any manual intervention from the outside.
Question:- How does Redis-Cluster provides Automatic-Failover ?
Answer:- Redis-Cluster can come to know quickly, whenever the primary shard has failed and it can promote its replica to the new primary.
- Say, we have one replica for every primary shard. If all our data is divided between three Redis Servers, we would need a six-membered cluster, with three primary shards and three replicas.
- All six shards are connected to each other over TCP and constantly ping each other and exchange messages. These messages allow the cluster to determine which shards are alive.
- When enough shards report that a given primary shard is not responding to them, they can agree to trigger a fail-over and promote the shard’s replica to become the new primary. The number of shards that needs to agree that a fellow shard is offline before fail-over is triggered, is configurable at the time of cluster-creation.
Question:- Demonstrate how the Split-Brain situation can happen with Redis-Cluster ?
Answer:- Here, is how the Split-Brain situation is demonstrated :-
Step #1.) Imagine that, we have got a Redis-Cluster with THREE primary shards and one replica for every primary shard. Overall, our Redis cluster is a six-membered cluster, with three primary shards and three replicas. Further imagine that, Network Partitioning has have happened i.e. the group on the left side will not be able to talk to the shards in the group on the right side.
- Now, both cluster-groups will think that they are offline and both shall trigger a fail-over of any primary shards, resulting in left side with all primary shards, as well as right side also would have all primary shards.
Step #2.) Both sides, thinking they have all the primaries, will continue to receive client requests that modify data. And that is a problem, because maybe client A sets the key foo to bar on the left side, but a client B sets the same key’s value to baz on the right side.
Step #3.) When the network partition is removed and the shards try to rejoin, we will have a conflict, because we have two shards holding different data, claiming to be the primary, and we wouldn’t know which data is valid. This is called a split brain situation, and it’s a very common issue in the world of distributed systems.
Question:- What’s the solution to fix the Split-Brain situation ?
Answer:- Maintain an odd number of primary shards and two replicas per primary shard. Here is the detailed solution to this problem :-
- To prevent something called a split brain situation in a Redis cluster, always keep an odd number of shards in your cluster.
- Now, when we get a Network-Split, left and right group shall do a count and see if they are in a bigger (majority) or smaller group (minority) ?
- If a particular group is in Minority, it shall NOT try to trigger a fail-over and shall NOT accept any client write requests.
Let’s take this below cluster :-
Now, Imagine a network-split happens like this :-
- Here, Left side group (set of nodes), is in Minority and therefore it shall NOT try to trigger a fail-over and shall STOP accepting any client write requests.
- Right side group (set of nodes), is in Majority and therefore it has authority and capability to trigger a fail-over of any primary shards.
That’s all in this section. If you liked reading this blog, kindly do press on clap button multiple times, to indicate your appreciation. We would see you in next part of this series with Hands-On with Redis-Cluster.
References :-
- https://redis.io/docs/manual/scaling/#redis-cluster-101
- https://adityagoel123.medium.com/scalability-ha-with-redis-cluster-3d6499084550
- https://adityagoel123.medium.com/high-availability-with-redis-replication-and-sentinel-af09141e7516
- https://adityagoel123.medium.com/beginners-guide-to-redis-756eeac7009
- https://adityagoel123.medium.com/hands-on-with-redis-part-2-476e91e5d949
- https://adityagoel123.medium.com/hands-on-with-redis-part-1-b24a9302f8c6