High Availability with REDIS Replication and Sentinel

Welcome to this blog. If you are coming here directly, it’s highly recommended to read through this story first. We shall be looking at following topics in this blog :-

  • Overview of High-Availability.
  • Setting up Redis-replication.
  • Setting up security with Redis-Server, replica & Client.
  • Overview of Redis-Sentinel for Automatic-failover.
  • Setting up Redis-Sentinel for Automatic-failover.

Question:- What’s the meaning of High-Availability ?

Answer:- High availability is a computing concept describing systems that guarantee a high level of uptime. An highly available system is designed to have following features :-

  • Fault-Tolerant.
  • Highly Dependable.
  • Operating continuously without intervention
  • Doesn’t have any single point of failure.

Question:- What does High Availability mean for Redis, specifically?

Answer:- Well, it means that if your primary Redis server fails, a backup will kick-in, and our customers / end-users of redis shall see little to no disruption in the service.

Question:- How High Availability can be achieved with Redis ?

Answer:- There are two components needed for this to be possible :-

  • Replication
  • Automatic failover.

Part #1 : Replication

Question:- Explain the concept of Replication ?

Answer:-

  • Replication is a continuous copying of data from a primary database to a backup or a replica database.
  • The two databases are usually located on different physical servers, so that we can have a functional copy of our data in case, we lose a server where our primary database sits.

Question:- Can Replication happen in any direction with Redis ?

Answer:-

  • Replication in Redis follows a simple primary replica model where the replication happens in one direction, from the primary to one or multiple replicas.
  • Data is only written to the primary instance, and replicas are kept in sync so that they are exact copies of the primaries.

Question:- How do we configure a particular server as a Replica to Redis Primary Server ?

Answer :- To create a replica, we instantiate a Redis server instance with the configuration directive : “replicaof” set to the address and port of the primary instance.

In aforementioned picture, it indicates that my current redis-instance is going to act as replica-redis-instance for redis-primary-server (192.168.1.1:6379)

Question:- How does Replication actually happens with Redis ?

Answer :- Once the replica instance is up and running, the replica will try to sink with the primary.

  • To transfer all of its data as efficiently as possible, the primary instance will produce a compacted version of the data as a snapshot in an RDB file and send it to the replica.
  • The replica will then read the snapshot file and load all of its data into memory, which will bring it to the same state the primary instance had at the moment of creating the RDB file.
  • When the loading stage is done, the primary instance will send the backlog of any write commands run since the snapshot was made.
  • Finally, the primary instance will send the replica, a live stream of all subsequent commands.

Question :- How does a newly added replica in a Redis cluster synchronize data from the primary instance?

Answer :- Once a replica instance is up and running, the replica will try to sync with the primary.

  • To transfer all of its data as efficiently as possible, the primary instance will produce a compacted version of the data in a .rdb file and send it to the replica.
  • The replica will then read the snapshot file and load all of its data into memory, which will bring it to the same state the primary instance had at the moment of creating the .rdb file.
  • When the loading stage is done, the primary instance will send the backlog of any write commands run since the snapshot was made.

Question:- What’s the nature of Redis Replication ?

Answer :- By default, replication is asynchronous in nature. This means that if you send a write command to Redis, you will receive your acknowledged response first, and only then, will the command be replicated to the replica.

Question:- What if the primary server goes down, after acknowledging a write but before the write can be replicated ?

Answer:- In this case, we might end-up having data loss.

Question:- Is there any way, with help of which we can prevent this situation ?

Answer:- To avoid this, the client can use the wait command. This command blocks the current client until all the previous write commands are successfully transferred and acknowledged by at least some specified number of replicas. For example, if we send the command wait 2 0 :-

  • The client will block (i.e. primary-server shall not return a response to the client) until all the previous write commands issued on that connection have been written to at least two replicas.
  • The second argument, 0, will instruct the server to block indefinitely. But we can set it to a number in milliseconds, so that it times out after a while and returns the number of replicas that successfully acknowledged the commands.

Question:- Can Redis-Clients also write directly to Redis-Replica Nodes as well ?

Answer:- Replicas are read only. This means that you can configure your clients to read from them, but you cannot write data to them.

Note that, we are running Redis-Slave at port 6380 and same had been demonstrated in further text.

Question:- Can Redis-Clients read from Redis-Primary Nodes as well ?

Answer:- If you need additional read throughput, you can configure your Redis client to read from primary node as well along with reading from replicas. We can easily scale reads and writes, without writing any complex client logic.

Question:- Let’s now proceed for setup of Replication with Redis ?

Answer:- We setup the replication for high availability of your data.

Step #1.) We’ll make a second copy of the configuration-file. Now, we shall be launching another redis-server on an another port, with this new configuration-file.

  • Note that, our original redis-server runs @ port 6379.
  • And, our replica-redis-server would now be running @ port 6380.

Step #2.) The next thing that, we want to make change in the new-configuration-file is to set this new server as the replica of the original redis-server.

Step #3.) Next, we shall be setting up the AOF fileName with some different name :-

Step #4.) Let’s now start our master-redis-server at port 6379, with the help of original redis configuration file :-

Step #5.) We also start our replica-redis-server at port 6380, with the help of new-redis configuration file :-

  • As you can see here, it says MASTER <-> SLAVE sync started, so we are, this one is the slave, so the process running at port number 6380 is the slave.
  • Eventually, we shall see that, we have both the master & slave processes running successfully and synchronised.

Question:- Let’s now proceed for setting-up security with Redis ?

Answer:- We setup the security with Redis now, so that all redis-replicas shall have to use some form of credentials, in order to connect to the primary-server :-

Step #1.) Let’s now proceed with configuration-file of master-redis-server and set up the password under clause “requirepass” :-

Step #2.) Now, as soon as we take a re-start on the master-redis-server, we would observe the failure in replication-server-side, because even replica now would need the password, in order to connect to the master-redis-server.

Step #3.) Let’s now proceed with configuration-file of replica-redis-server and set up the password with which our replica-server shall connect with master-replica-server, under clause “masterauth” :-

Step #4.) Next, we would restart the replica-redis-server as well now with aforesaid configuration change :-

Step #5.) Now, since we have already taken re-start on the master-redis-server, even our connection through the Redis-CLI would also start failing :-

Step #6.) Let’s now restart our Redis-CLI as well with the password being supplied to it :-

Part #2 : Automatic RollOver

Question:- Is having only backup of our data sufficient for High Availability?

Answer:- Having a backup of our data is NOT enough for high availability. We also have to have a mechanism that will automatically kick in and redirect all requests toward the replica in the event that the primary fails. This mechanism is called, automatic failover.

Question:- How does Automatic-Failover is being achieved with Redis ?

Answer:- Sentinel — a tool that provides the automatic failover.

Question:- Can you explain a bit about Sentinel ?

Answer:- Redis Sentinel is a distributed system consisting of multiple Redis instances started in sentinel mode. We call these instances Sentinels.

  • The group of Sentinels monitors a primary Redis instance and its replicas.
  • If the sentinels detect that the primary instance has failed, the sentinel processes will look for the replica that has the latest data and will promote that replica to be the new primary.
  • This way, the clients talking to the REDIS-SYSTEM will be able to reconnect to the new primary and continue functioning as usual, with minimal disruption to the users.

Question:- How does Sentinel decides that a primary Redis instance is down ?

Answer:- In order for the Sentinels to be able to decide that a primary instance is down :-

  • We need to have enough Sentinels agree that the server is unreachable from their point of view.
  • Having a number of Sentinels agreeing that they need to take an action is called reaching a quorum. If the Sentinels can’t reach quorum, they cannot decide that the primary has failed. The exact number of Sentinels needed for quorum is configurable.

Question:- How does Sentinel kicks-off failover ?

Answer:- Once the Sentinels have decided that a primary instance is down, they need to elect a leader (a Sentinel instance) that will do the failover.

  • A leader can only be chosen if the majority of the Sentinels agree on it.
  • In the final step, the leader will reconfigure the chosen replica to become a primary by sending the command REPLICAOF NO ONE and it will reconfigure the other replicas to follow the newly promoted primary.

Question:- What essential thing, should we take care before planning to use Sentinel ?

Answer:- If you have a system that uses Sentinel for high availability, then you need to have a client that supports Sentinel too. Not all libraries have this feature, but most of the popular ones do, so make sure you add it to your list of requirements when choosing your library.

Question:- Let’s now proceed for setup of Sentinel with Redis ?

Answer:- Here are steps to setup the Sentinel based system with Redis :-

Step #1.) First, perform the Redis replication setup, as explained in above steps, which shall provide us High-availability. So, we have got :-

  • One Primary Redis Instance, being powered @ port : 6379.
  • Another Primary Redis Instance, being powered @ port : 6380.

Step #2.) To initialise a Redis Sentinel, we need to provide a configuration file, so let’s go ahead and create one :

$ touch sentinel1.conf

Open the file and paste in the following settings:

port 5000
sentinel monitor myprimary 127.0.0.1 6379 2
sentinel down-after-milliseconds myprimary 5000
sentinel failover-timeout myprimary 60000
sentinel auth-pass myprimary a_strong_password

Below is the break-down of terms :-

  • port → The port on which the Sentinel would run.
  • sentinel monitor → It monitor the primary redis instance on a specific IP address and port. By having the address of the Primary Redis Instance, the Sentinels will be able to discover all the replicas on their own. The last argument on this line is the number of Sentinels needed for quorum. In our example — the number is 2.

Recall that, We need to have enough Sentinels agree that the particular-server is unreachable from their point of view. The situation where, enough number of Sentinels agrees that, they need to take an action is known as reaching a quorum. Once Sentinels reaches quorum, it means that, they have finally decided that the primary has failed.

  • sentinel down-after-milliseconds → How many milliseconds should an instance be unreachable so that it’s considered down.
  • sentinel failover-timeout → If a Sentinel voted another Sentinel for the failover of a given master, it will wait this many milliseconds to try to failover the same master again.
  • sentinel auth-pass → In order for Sentinels to connect to Redis server instances when they are configured with requirepass, the Sentinel configuration must include the sentinel auth-pass directive.

Step #3.) Make two more copies of this file — sentinel2.conf and sentinel3.conf and edit them so that the PORT configuration is set to 5001 and 5002, respectively.

Step #4.) Let’s initialise the three Sentinels in three different terminal tabs:

# Tab 1
$ redis-server ./sentinel1.conf --sentinel
# Tab 2
$ redis-server ./sentinel2.conf --sentinel
# Tab3
$ redis-server ./sentinel3.conf --sentinel

Step #5.) Now, Sentinels are all set and we can run some commands to get to know some useful information :-

# Provides information about the Primary
SENTINEL master myprimary
# Gives you information about the replicas connected to the Primary
SENTINEL replicas myprimary
# Provides information on the other Sentinels
SENTINEL sentinels myprimary

Question:- Let’s now proceed to demonstrate Automatic Failover with Redis Sentinel ?

Step #1.) First, here observe that we have got following instance as current master :-

# Provides the IP address of the current Primary
SENTINEL get-master-addr-by-name myprimary

Step #2.) Let’s kill the primary Redis instance now by pressing Ctrl+C or by running the redis-cli -p 6379 DEBUG sleep 30 command :-

Step #3.) We’ll be able to observe in the Sentinels’ logs that the failover process shall start in about 5 seconds :-

Step #4.) If we now run the command that returns the IP address of the Primary again, we shall see that the replica has been promoted to a Primary:

redis> SENTINEL get-master-addr-by-name myprimary
1) "127.0.0.1"
2) "6380"

Thus, it demonstrates that, sentinel have actually handled the automatic failover for us. By this way, Redis-Sentinel provides us the mechanism of Automatic-Failover.

That’s all in this section. If you liked reading this blog, kindly do press on clap button multiple times, to indicate your appreciation. We would see you in next part of this series.

Reference :-

--

--

--

Software Engineer for Big Data distributed systems

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Eunavi 1 din 7'’ Android 9.0

Why TPY invested in a startup building a whole new database

5 metrics that will improve when your team adopts TDD

Origin: Warranty Tracker

BASH Scripting

📢Updates on IMO/IGO calendar for #MetaLaunch

Simple Thruster Charge Meter

Lighthouse Defi

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
aditya goel

aditya goel

Software Engineer for Big Data distributed systems

More from Medium

Scalability & HA with Redis Cluster

Kafka Internal Working Mechanism

Build Retry-able API using idempotency key

Elasticsearch Backup and Restore with AWS S3 Bucket