Deep dive into AWS for developers | Part3 — RDS & ElasticCache

aditya goel
14 min readApr 25, 2021

In case, you are landing here directly, it would be recommended to visit this page.

In this section, we would deep dive to learn about AWS-RDS & ElasticCache.

AWS RDS stands for Relational Database Service. Its a Managed Database-Service. It allows us to create databases in the cloud, that shall be managed by AWS. Following types of database-engines are supported using AWS RDS :-

  • SQL-Server
  • Amazon AURORA (AWS Proprietary Database) → This is not compatible with Free-Tier.

Following are the advantages of using AWS RDS Vs Deploying RDS on EC2 :-

  • AWS Managed RDS takes care of automatic provisioning of database.
  • It also takes care of underlying OS patching. Please note that, we as end-users can’t at all login/SSH to the underlying instance and that’s why it is being termed as Managed service.
  • It takes care of continuous backups and same can be restored to specific timestamp i.e. Point-In-time-Recovery.
  • We get Performance dashboards, so as to view the performance of the database.
  • It supports Horizontal Scalability i.e. we get Read-Replicas for improved Read-Performance. We can also perform Vertical-Scalability on AWS RDS.
  • We can also setup the Multi-AZ setup for DR database.
  • Storage is backed up by EBS i.e. by using gp2 OR io volumes.
  • This is suitable for unpredictable workloads.

RDS Backups :-

RDS Storage Auto-Scaling :- It helps us to auto-increase the storage on our RDS DB instance. Say we had originally specified the storage of 20 GBs and now we are close to over-running it, then it would auto-scale the storage for us, based upon the set threshold limits(i.e. Maximum Storage Threshold). It shall automatically modify the storage, if following conditions are met :-

  • In case underlying storage left is less than 10% of underlying storage.
  • Low storage last for more than 5 minutes.
  • 6 hours have passed, since last modification.

On the contrary, in traditional world, if we would had been managing the DB on hard-server, we would have to manually login to the server and take care of adding the another additional disk.

RDS Read-Replicas :- It helps us to scale our reads, in case our read-work-load is too heavy. Please note that, replication to read-replica-instances happens asynchronously and therefore Reads through replicas are eventually-consistent. Applications have to mandatorily use the different db-endpoint in order to use the Read-replicas.

We can use the Read-Replicas for use-case where we have reporting workload. This “Reporting Application” can directly query to the replica DB. Production application is unaffected from this process. Mundane to mention, only “select” type of SQL queries comes under category of read-workload. Other types of queries such as “Insert”, “Delete” & “Update” doesn’t comes under the category of Read workload.

Read Replicas imply we need to reference each of replica node, individually in our application as each read replica will have its own DNS name.

RDS Read-Replicas Costing:- Please note that, usually In AWS parlance, whenever data travels from one AZ to another, there is some cost involved. But In case of AWS Managed RDS, there is no cost, when data is replicated from one AZ to another, but there is definitely cost involved when data is replicated from one region to another.

Please note that, Multi AZ keeps the same connection string regardless of which database is up.

Boosting Availability | RDS Multi Availability-Zone (Disaster Recovery) :- We can very well setup the standby RDS DB instance in another AZ, by setting up the Synchronous-Replication approach. It means that, whenever there is any change being written on the master DB, the same has also to be accepted & should be written to replica DB instance in order to get accepted.

So, basically we get a single DNS name, with automatic failover to the standby. We don’t have to do any manual intervention in our applications in order to achieve this failover. Under following scenarios, this type of setup is helpful :-

  • Failover in case of complete AZ failure / blast / disaster.
  • Loss of Network itself to Master database.
  • Loss of Storage failure of RDS instance.

Please note that, this is merely an standby database and can not be used as replica db. It comes only in action, whenever there is something wrong with master DB.

While the setup of standby database (i.e. setting up of RDS from one AZ to Multi AZ) doesn’t requires any downtime, under the hoods, following things happens :-

  • First, internally snapshot is being taken.
  • New DB is restored from the recently taken snapshot in new AZ.
  • And At last, Synchronisation is established between the two databases.

Installation of AWS RDS :- Let’s first setup the Amazon RDS DB instance, using the Standard Template :-

We would be using the MSQL engine for this demonstration :-

Next, we shall be selecting the “Free Tier” for this setup :-

Next, let’s configure the DB name by providing the instance-identifier, username and password :-

Next, let’s select the class of underlying EC2 instance over which this DB shall be powered. In the free tier, we can only go with “db.t2.micro” type of instance which comes with 1 GB of RAM and 1 core CPU. Please note that, we can’t SSH/login to this instance.

Next, let’s select the storage for this DB instance. Default storage is 20 GB and we can enable the Storage-auto-scaling option as well :-

Next, Please note that, we can’t create the Multi-AZ standby setup for Free-Tier, but the same can be done for production :-

Next, we would be selecting the networking properties i.e. VPC, subnets & security-groups with which we shall be launching our AWS RDS DB instance.

Next, we would be specifying the database-name below :-

Next, As of now, we are only selecting the password based Authentication.

Next, we would be specifying the backup options. Default backup is for 7 days and we can also specify the window, during which backup should be performed.

Next, we have liberty to specify the Monitoring, logs, etc. to Cloudwatch.

Next, RDS also provides us option for version upgrades automatically.

Finally, we hit the create button in order to create the RDS instance of DB.

Here, is our RDS instance created :-

RDS Data Security :- AWS RDS provides following types of security :-

  • At Rest Data encryption :- It stands for security of data, not in movement. We can encrypt the master data with AWS KMS AES-256 encryption. Encryption has to be defined at the launch time. Also, If the master is not encrypted, then read replicas can not be encrypted as well.
  • Inflight Data encryption :- It stands for security of data, in movement i.e. when data is traveling from applications/clients to the database. We use SSL certificates to encrypt the data in flight.

If master RDS database is unencrypted, their respective snapshots/backups shall also be unencrypted. Similarly, the snapshots/backups of encrypted RDS shall also be encrypted.

Now, let’s talk about the AWS RDS security from Network and IAM prospective :-

Let’s see how the IAM based authentication would work out :-

  • Our EC2 instance would have something called as IAM Role.
  • Using this IAM role, EC2 instance would issue an API call to the RDS service, to get “Auth Token”. This auth-token is a short-lived credential.
  • Using this “Auth Token”, we now connect to AWS RDS database instance. Its always recommended to have : Network In/Out connection between Application & DB as encrypted through SSL.

AWS AURORA :- This is proprietary technology from Amazon and it is said to be highly cloud optimised, as it claims to have 5 times high performance as compared to MySQL on RDS and 3 times high performance as compared to Postgres on RDS. Aurora can support upto 15 replicas, while Mysql supports only upto 5 replicas. It provides auto-expanding storage upto 64 TB. Its cost is little bit higher (around 20%) as compared to RDS. Below is how, Aurora supports the High-Availability and Self-Healing :-

  • One Aurora instance (master) takes writes.
  • Automated failover for master in less than 30 seconds.
  • Usually, there are 6 copies of your data across 3 Availability-Zones and Storage is striped across shared storage volume.

Please note that, during failover, Master instance can very well change automatically i.e. our applications don’t need to change the “writer-endpoints”. Similarly, we also have “reader-endpoints”, which is load-balanced on the pool of all read-replicas. Our applications/clients would only connect to those reader endpoints and under the hoods, it would get connected to one of the reader replica node. This Load-Balancing happens at connection level and not the statement/query execution level.

Introduction to AWS Elastic Cache :- In the similar fashion like we have RDS as managed relational database, similarly we have ElasticCache as the managed cache service. Supported cache services are Redis & Memcached. All the headache of OS maintenance, patching, optimisation, backup, setup, failure-recovery and monitoring is taken care by AWS. Please note that, here also, we don’t have access to underlying OS/server and we can’t login/SSH to the same. In general, Elastic-Cache can be leveraged for following purposes :-

  • To make our application stateless i.e. the state of the application could be stored in the ElasticCache. Say user logs into the application, then this application can write the session-data to the Elastic-Cache. Now, another request from the same user lands at the different instance, then that user’s session data can be well retrieved from the ElasticCache and say we found the user data, that’s how the user can be deemed to be already logged-in. That’s how, we have made our Application Stateless. Below is how the architecture for the same looks like :-
  • To reduce the load from the databases for read intensive workload. Idea here is that, common queries would get cached and now database would not be queried. The results can be served directly from the cache itself. Along with this, ElasticCache should also have an Invalidation-Strategy to make sure that, only most recent data is living inside the cache. Below is how the architecture for the same looks like :-

Comparison amongst multiple ElasticCache :- Let’s see following types of caches :-

  • MEMCACHED ElasticCache → It supports multi-node-cluster for partitioning of data, thus it provides Data-Sharding. In this mode, there is no replication happening and hence no High-Availability. There are NO backups and restore features. Its also not a persistent cache. Its a multi-threaded architecture.
  • REDIS ElasticCache → Just like the RDS database, It supports high-availability and read-replicas to scale reads. It supports Multi-AZ with auto-failover. It also provides data durability using AOF persistence and therefore Redis can also be used as databse. Also, there are backups and restore features. Redis can also be used as pub-sub message brokers.

REDIS ElasticCache modes :- Two types of modes are supported :-

  • Cluster mode disabled :- In this mode, there is a single shard and all nodes are part of this single shard only. Inside this shard, we have ONE master and upto 5 replica nodes. In case of failover of master node, one of the replicas can take over. Replication from master to replica-nodes happens asynchronously. Primary node shall be used for read & write operations, whereas only the replica nodes shall be used for read operations. Here, all the nodes in an cluster, have all the data at all the times and hence it provides us safeguard against data-loss, in case of any node failure. Its quite helpful for scaling the read operations. It also supports the Multi-AZ setup as well.
  • Cluster mode enabled :- In this mode, data is partitioned across multiple shards. Each shard has a primary node and upto 5 replica-nodes in it. It also supports Multi-AZ setup. Its quite helpful to scale the writes. We can have upto 500 nodes per cluster. For e.g. say we don’t setup the replica-nodes, then there can be 500 shards possible, each with single master. Similarly, say we setup each shard with 1 master and 1 replica, then in-total there would be 250 shards at-max possible.

Demonstration of ElasticCache (REDIS) :- First, lets setup the ElasticCache cluster :-

Next, let’s define the version, port and underlying instance type, at which ElasticCache cluster shall be running.

Please note that, in case we have ZERO replicas, the Multi-AZ setup can not be established. There is need to have at-least ONE replica, in order to setup the Multi-AZ setup.

Next, let’s setup the Subnet-Group in which this ElasticCache cluster shall be setup.

Next, we can define the “encryption-at-rest” through AWS KMS. We can also specify the “encryption-in-transit” using Redis AUTH Token. This token shall be necessary to be passed by the applications in order to connect to the Redis. In case, we disable the “encryption-in-transit”, then there is no way for us to specify the usage of Redis Auth.

Next, we can define the backup retention period and window. Also, some other configs :-

At-last, we can define the maintenance window and additional tags. Finally, we hit create button.

Caching Implementation concerns :-

  • Usually, its safe to cache the data, but sometimes it may be out of time and it might become eventually-consistent.
  • It’s appreciable to use the caching when, data is changing slowly and there are few keys, which are needed more frequently, but say if the data is changing too frequently and all large key-space is needed frequently, then usage of Caching is considered as an Anti-Pattern.
  • It’s suggestible to use Caching, when the data to be stored has appropriate structure. For example → Key-Value caching OR Caching of aggregation results.

Lazy Loading (Cache Aside) approach to Cache Population :- Only the requested data (to Database) is cached and Cache is not filled with just everything. Below is how it works all :-

  • Whenever application needs some data, its going to inquire to the cache first and in case cache has the requested data, this is called as “Cache Hit”.
  • In case, data is not present in the cache, this is termed as “Cache Miss”. In this scenario, the application sends requests to Database and then writes the data back to the cache, so that other request (from the application) can find the data into the cache. Please note that, in this case of “Cache Miss”, there is penalty on the request as, there are 3 round-trips being involved in total. It may be a bad user experience and lead to some additional latency to the users.

Below is how the pseudo code for the same looks like :-

In case, the data from cache gets cleaned-up, then cache might take some “warm-up” time i.e. time duration in which, records shall be read from database and then written to cache. Also, data once written to the cache, might become stale, in case it is updated in the database.

Write-Through approach to Cache Population :- Everything is simple cached to the ElasticCache. This may result in Cache-Churn, as not all the data shall be read.

  • Whenever application needs some data, its going to inquire to the cache first and in case cache has the requested data, this is called as “Cache Hit”.
  • Whenever, application sends some write-request to the database, the same is also written to the Elastic-Cache.

With this approach, the data is never stale in the Cache and reads are quite quicker. Although, there is a write penalty involved i.e. for every write-call, there are actual 2 calls being involved.

Below is how the pseudo code for the same looks like :-

Cache Eviction Policy :- It can occur in three ways :-

  • We delete the items explicitly from the Cache.
  • Items can be evicted, because memory is full and as per LRU principle, older entries () shall be deleted.
  • Items can be deleted, because there is a TTL counter being set on the entry. TTL can range from few seconds to hours. TTL can be helpful for these types of data like Comments, Leaderboards & Activity-Streams, etc.

If too many evictions happens, then its an indication to scale-out our Cache.

References :-



aditya goel

Software Engineer for Big Data distributed systems