Deep dive into AWS for developers | Part2 — Scalability

20 min readApr 24, 2021

In case, you are landing here directly, it would be recommended to visit this page.

In this section, we would deep dive to learn about AWS ELB & Scalability aspects.

AWS Scalability: It means that, software system can handle higher loads by self-adaptation. We can either do Vertical OR Horizontal scaling.

Vertical-Scalability → It means increasing size of the given instance. For e.g. Say we have a system with CPU of 1 Ghz clock-rate and 2 GB RAM and its able to handle load of 50 TPS, but say if we increase its capacity to 4 GB RAM and 2 GHz clock-rate, it might be able to handle load of 80 TPS. Although, there is always a limit to how-much we can vertically scale a particular application i.e. depending upon the hardware limit. Analogous to this: Say we have “t2.micro” instance currently and we replace it with “t2.large” instance, then that shall be termed as Vertical scaling. We use vertical-scaling, when we have non-distributed systems like database-server. AWS services like RDS, ElasticCache can be vertically scaled by upgrading the underlying instance-types. As of this writing, AWS has as small as “t2.nano” instance with 0.5 GB of RAM and 1 vCPU. AWS has as large as “u-I2tbI.metal” instance with 12.3 TB of RAM and 448 vCPUs.

Horizontal-Scalability → It means increasing the number of instances for our application. e.g. Say we have a system with CPU of 1 Ghz clock-rate and 2 GB RAM and its able to handle load of 50 TPS, but say if we add-on another same instance with same capacity, it might be able to handle load of 1000 TPS. Analogous to this: Say we have “t2.micro” instance currently and we add another same “t2.micro” instance, then that shall be termed as Horizontal scaling. Generally, Horizontal-scaling implies that we have distributed-systems in place. This is very usual for the web-applications, but not every application/software-system is a distributed system. It’s very easy to scale horizontally using EC2 instances. This is also being used behind the Auto-Scaling-Group and Load-Balancer.

AWS High-Availability: It means that, running our software-system/application in at-least 2 data-centres i.e. at-least 2 different Availability-Zones. The intention of HA is to survive the data-center loss.

High-Availability can be in passive mode, meaning that application in another data-center would only get activated, if primary application goes down. e.g. RDS multi-AZ setup.
High-Availability can be in active mode, meaning that application has been horizontally-scaled and all of the instances are getting the live-traffic equally.

This is also being used behind the Multi-AZ-Auto-Scaling-Group and Multi-AZ-Load-Balancer.

AWS Load-Balancing: Load Balancers are servers that forward Internet traffic to multiple servers(i.e. EC2 instances) downstream. Following is the demonstration of how the LB generally works :-

Following are the advantages of having a LB in place :-

LBs helps in spreading the load across multiple downstream EC2 instances in a round-robin fashion.
LBs are very helpful in exposing a single entry-access-point(i.e. through help of DNS) to the application.
LBs also provides SSL-termination(https) for connections to our sites/software.
LBs also enforces stickyness with help of cookies.
LBs provides high-availability across multiple Availability-Zones i.e. LB can be spread across even multiple AZs as well.
LBs also can differentiate between public & private traffic.
LBs also handles the failures of downstream EC2 instances, by constantly doing health-checks every 5 seconds. This time can be configurable very well. Health-check is usually done on a port & end-point (e.g. /health). If the response is 200 OK,

Following are the advantages of having an Amazon ELB in place :-

AWS guarantees that ELB shall be working and available always.
AWS takes care of high-availability, upgrades and maintenance of the ELB.
AWS also provides configuration-handles for ELB.
It’s integrated with many AWS offerings and services.
AWS LB can scale unlimited, but not instantaneously. So, we might need to contact to AWS support for the same.

Types of Load-Balancers on AWS :-

Internal Private LB → This LB is private within an AWS account. We can’t access it from public web.
External Public LB → This LB is publicly available on the web and users can very well access it.

Use-case : Application behind the Load-Balancer :-

Load-Balancer has its own security-group. Take for example, the aforesaid picture as shown. According to these rules, it allows http type incoming traffic on port 80 from anywhere on the web and allows https type incoming traffic on port 443 from anywhere on the web.
EC2-instance also has its own security-group. According to these rules, as shown in above picture, only source from where incoming-traffic (at EC2 instance) allowed is : from security-group belonging to LB. The EC2 security-group references the security-group of that of Load-Balancer.

Following are the ways with which, we can easily troubleshoot the issues associated with Load-Balancer :-

Demonstration for Classic-Load-Balancer : This can accept the http based traffic on port no. 80.

Next, we would be assigning the new-security-group (with incoming http traffic allowed on port no. 80 from anywhere in world) to this CLB.

Next, we shall be configuring the health-check from the CLB to the backend EC2 instance. As per below rule, health-check shall be checked on port no. 80.

Next, we shall be adding our first EC2 instance in the back of this Classic-Load-Balancer.

Finally, the Load-Balancer has been now created.

Now, we can verify that, whether our Classic-Load-Balancer is working as expected (i.e. whether this CLB is passing the traffic well to its registered EC2 instances). Below response thus returned, is actually being returned from the EC2 instance behind this CLB.

Please also note from “Part-1” of this tutorial that, we had setup the Apache webserver on the EC2 instance and still, even the EC2 is directly accessible as well from public web as shown below.

We shall be stopping this direct access to our EC2 instance, because as per good design principles, outer world should only be accessing our application via the Load-Balancer only. Let’s go ahead and modify our security group of the EC2 instance. Now, we say that, incoming traffic shall be allowed onto the EC2 instance, only from within the ‘security-group’ of the CLB. Below is how, the modified security-group for the application looks like :-

Thus, now the EC2 instance shall not be directly accessible and could be accessed only via the CLB. This is much better security arrangement now.

Introduction to Application-Load-Balancer : This is a LB which works at Layer-7 i.e. ‘http’. This allows for :-

Load-Balancing to HTTP applications across multiple machines (target-groups).
Load-Balancing to multiple HTTP applications on a same machine (ex: containers).
ALB also supports redirects from http to https.
ALB supports for http/2 and web-sockets.
ALB can route to multiple target-groups.

Following are some of the properties for the ALB :-

As demonstrated below, same ALB is capable of routing the requests for different end-points to different target-groups :-

Introduction to Target-Group :- Any Target-group can have behind it group of :-

Demonstration of ALB :- First, lets create the ALB :-

Next, we shall be using the existing security-group, which allows the http traffic on port no. 80 :

Next, we would be creating our first security-group and this security-group shall have EC2-instances, behind it.

Next, we shall be configuring the health-checks in a similar way we did for Classic-Load-Balancer.

Next, Let’s register the targets for this newly created target-group :-

Next, finally we land up at this screen, where we successfully created our first ALB.

Now, we can verify that, whether our Application-Load-Balancer is working as expected (i.e. whether this ALB is passing the traffic well to its registered target-groups). Below response thus returned, is actually being returned from the target-group (EC2 instances in target-groups) behind this ALB.

Let’s now see the listener based rules at ALB level, which we already have configured :-

Default rule says that, all the ‘http’ traffic landing on the port 80, would end up being forwarded to the security-group ‘aditya-first-TG’.

Let’s add another rule here at listener level. This would make sure that, anyone who wants to access the endpoint ending with ‘/admin’ shall be getting the 503 http status code returned.

Now, we can verify that, whether this new end-point is available through our Application-Load-Balancer :-

Note that, we can also create different rules. For example, if we want that, endpoints having ‘/test’ in the end-point should only be routed to the ‘my-second-target-group’.

Introduction to Network-Load-Balancer : This is a LB which works at Layer-4 i.e. ‘tcp’ & ‘udp’. This allows for :-

CLB & ALB didn’t had any static IPs, rather they had public DNS-names. NLBs are used for those use-cases, which have extreme performance requirements for TCP/UDP traffic. NLB are not included in the free-tier. Below is how the NLP can be depicted.

Demonstration of NLB :- First, lets create the NLB :-

Next, let’s add target-groups to this NLB and add EC2 instance to this NLB.

Next, we can configure the health-checks for this NLB :-

Finally, we are done with our NLB creation step.

Please note that, our application shall still not be accessible through this NLB, because remember we had yet not associated any security-group to this NLB, while we very well did associated the security-group while we created the CLB & ALB. Note that, in this case, we shall be editing the Inbound Rules for the original security-group only. With the below rule being added, now EC2 instance can very well receive the traffic from outside world as well :-

Introduction to Load-Balancer-Stickiness :- It means that, same client is always redirected to same instance behind a load-balancer. This works for both CLB & ALB. Generally, this is achieved by using a “Cookie” which has an expiration time being set. With session-stickiness in place, there may be imbalance in the traffic being routed to the backend EC2 instances. We might require stickiness for the scenarios, where data can’t be afforded to be lost. Below is how the mental model can be conceptualised :-

Please note that, Stickiness is property that can be configured at the target-group level. Behind our ALB, we have the below target-group configured. We can enable/disable the property of ‘Stickiness’

In AWS parlance, the stickiness is usually configured for some fixed duration of time. Say we configure it to 240 seconds, then it would mean that, if client-A’s calls lands at EC2-instance-1 at t=0, then the all subsequent calls from client-A shall be landing at EC2-instance-1 for next 2 minutes. After that, request shall be routed to another instance and again due to stickiness property, it would be served for next 2 minutes.

AWS Cross-Zone Load-Balancing :- All the incoming traffic is divided equally amongst all the instances evenly, across all the Availability Zones. Below is the diagram depicting the same :-

Without cross-zone Load-Balancing, incoming traffic is divided evenly amongst instances lying in one AZ only. For example, In below picture, 3 AZs are shown below and each AZ’s LB is distributing the load only to the instances lying in that very AZ only :-

For Classic-LB, by default cross-zone load-balancing is disabled by default. Also, there are no extra charges, even if we enable the Inter-AZ load-balancing.
For Application-LB, cross-zone load-balancing is enabled by default and it can’t be diabled. There are no extra charges for Inter-AZ load-balancing.
For Network-LB, cross-zone load-balancing is disabled by default. There are some extra charges, we shall have to pay, in order to enable the Inter-AZ load-balancing through NLB.

AWS Elastic Load Balancer Certificates :-

SSL certificates allows the traffic between our clients and Load-Balancer encrypted in transit. This is also called as In-Flight-Encryption. We basically attach these SSL certificates to our Load-Balancer, which in-turn does encrypt the connection between our clients (i.e. Users of our application)and Load-Balancer.

SSL refers to ‘Secure Socket Layer’, used to encrypt the connections.
TLS refers to ‘Transport Layer Security’, which is a newer version.

Public SSL certificates are issued by Certifying-Authorities (CA) like Comodo, Symantec, GoDaddy, GlobalSign, Letsencrypt, etc. These SSL certificates also do have an expiration and needs to be renewed regularly.

SSL termination happens at the LB level. Usually, traffic between Load-Balancer and EC2 instance travels over the ‘http’ through the private VPC.

Note that, SNI is used to load multiple SSL certificates onto one web-server (to serve multiple websites).

Below is how, we can configure the HTTPS based listener on the Application-Load-Balancer. Note that, we can have multiple SSL-certificates on different target-groups. Observe that, we don’t have any SSL certificate as of now :-

AWS ELB Connection-Draining / De-registration-Delay :- This is the time to complete “in-flight-requests” while an EC2 instance is de-registering or unhealthy. Default time is 300 seconds. Here, idea is : ELB would stop sending the requests to the underlying EC2 instance, as soon as the EC2 instance starts de-registering itself or becomes unhealthy.

If our connections are shorter, we can set this value to be smaller.
The range for this value can be set between 0 to 3600 seconds.

AWS Auto-Scaling-Groups :- In real-world, traffic on our software-system/website can keep on changing. For example, during day-time traffic can be at peak, whereas during night time, traffic can be all time low. With Auto-Scaling-Group in place, we can achieve following too easily :-

Scaling-out i.e. Addition of an EC2 instance, in case load increases.
Scaling-in i.e. Removal of an EC2 instance, in case load decreases.
Make sure that, we have minimum no. of EC2 instances running.
Automatically, register/de-register the instances to the LB as well.

We can setup multiple Auto-scaling rules/policies through Cloud-Watch Alarming system. For example :-

Target Average CPU usage. For ex. If average CPU usage is more than 40%, its an alarm for us and would add an another EC2 instance.
Average Network-In and Average Network-Out.
No. of requests on the ELB per instance .
Pre-scheduled time, if we know the visitor-patterns in advance.

Below is the process of how the auto-scaling can work on the basis of custom metric, like no. of connected — users :-

We first send the custom-metric from our application (running on EC2 instance) to CloudWatch through the help of ‘PutMetric’ API.
We then setup the Cloud-Watch alarm to react to low/high values.
We then use those cloud-watch-alarms as the scaling policy for ASG.

Demonstration for Auto-Scaling-Group : Let’s setup our first ASG :-

As of now, we have a ALB setup which is sending the traffic to an EC2 instance and we have installed Apache web-server on the same :-

First, let’s terminate all of our EC2 instances and observe that, there are no instances behind our target-groups :-

Next, we observe that, since there are no underlying EC2 instance, in case we access the public url of the Load-Balancer, then we would get 503 error.

Next, let’s create an Auto-scaling-group. Through the use of Launch-Template, spot instances can be used. Launch-configuration allows to configure only one instance-type. Thus, first, we shall be creating the Launch-Template :-

Next, during “Launch-Template” creation, let’s select the Amazon Linux AMI.

Next, we would select our first key-pair and first security-group to link with this Launch-Template.

Next, we would go with free-tier configuration for storage-volume :-

Next, Under advanced section, we would be creating the “User-data”, using which we shall be installing the ‘httpd’ web-server and printing a sample text from it :-

#!/bin/bash
yum update -y
yum install -y httpd
systemctl start httpd
systemctl enable httpd
echo “<h1>Hello world from $(hostname -f) </h1>” > /var/www/html/index.html

Here, we have our Launch-Template created now :-

Now, we continue to create our ASG. In first step, we have to specify its name :

We, also need to specify the Launch-Template (the one we created just in previous step). Please note that, this launch-template would end-up launching “t2.micro” EC2 instance type with pre-specified security-group.

In Step-2, we specify the purchase-options configuration settings as per launch-template itself. Also, we specify here network & subnets :-

In Step-3, we specify the Load-Balancer, to which we shall be attaching our ASG. Here, we choose our earlier Application LB that we created :-

In this step, we also specify the health-checks that we want to perform. Health-check for EC2 is pre-configured by default. We can explicitly enable the health-check for ELB :-

In Step-4, we specify group-size and scaling policies :-

In Step-5, we specify tags, if we want to and finally hit create button :-

Finally, we now have a “Auto-Scaling-Group” created now :-

We can go under “Activity” tab in order to see the series of activities associated with our ASG, we can see here, our EC2 instance getting created :-

We can go under “Instance-Management” tab in order to see what’s going on with our EC2 instance. Below shown EC2 instance was created by our ASG.

We can go under “Instances” services to view, the newly launched EC2 instance, through our ASG. We can play with it, by stopping this instance, ASG would again re-launch a new instance :-

Please note that, before we began to setup the ASG, we had ZERO EC2 instances overall and because, we had linked our ASG with target-group and that target-group was in-turn linked to our ALB, thus the auto-launched EC2 instance(by ASG) got placed under the same specified target-group and hence hitting the ALB now would show the page from the web-server (the content we specified in the userData, while creating the Launch-Template).

Auto-Scaling-Group Policies: Policies sits at the core of the ASG. Through policies, we can control the scale-out and scale-in behaviour. Examples :-

Target Tracking Scaling :- Say, we want our average CPU usage for ASG to be 40%, then this policy would be automatically adding CPU instance, whenever CPU usage goes above 40%. This policy is used most in daily life. Another example for this policy-usage can be : Say an application is deployed with an Application Load Balancer and an Auto Scaling Group. Currently, the scaling of the Auto Scaling Group is done manually and we would like to define a scaling policy, that will ensure the average number of connections to our EC2 instances is averaging at around 1000.
Simple Step Scaling :- We can setup this policy using Cloudwatch. For e.g. When Cloud-watch alarm would get trigger-red (CPU usage goes above 70%), then add 2 ec2 instances to the ASG. Another example can be: When Cloud-watch alarm would get trigger-red (CPU usage goes below 30%), then remove 1 ec2 instance from the ASG.
Scheduled Actions :- Anticipate a scaling based on known usage pattern. Example :- Let’s increase the capacity by 10 more EC2 instances at 9 pm on next Friday.

Auto-Scaling-Group : Scaling Cool-downs : The cooldown period helps to ensure that your ASG doesn’t launch or terminate additional instances before the previous scaling activity takes effect.

Let’s create our first Simple-Step-Scaling-Policy. Navigate to the “Automatic Scaling” tab under ASG :-

Let’s now create our first cloud-watch alarms. Let’s create our first metric :

Select “EC2” from the below metrics :-

Select “By Auto Scaling Group” option from below metrics :-

Specify the metrics and conditions onto our “Auto-Scaling-Group” :-

Specify the condition as ‘Static’ CPU value greater/equal to 50% :-

Specify the action as “Triggering the email-notification”. For the same, we first need to create an SNS topic :-

Here is our first hand SNS topic being created :-

Rest of the options, shall remain as it is :-

Finally, specify the name of this CloudWatch-Alarm and hit create button :-

Now, using this aforementioned CloudWatch Alarm, we shall be creating our first scaling-policy of type “Step-Scaling”:-

Below is how, we specify the action. In our case, we want to add 1 CPU-unit, in case CPU utilisation goes above 50%.

Finally, we have our Scale-Out policy ready now.

Let’s put some load on our EC2 instance, so that its CPU usage goes above 50%. For the same, we need to install an YUM based package :-

Let’s now start the load on our instance :-

Here is the load too much now on our EC2 instance through CloudWatch :-

With this, its understood that our metric of CloudWatch has breached already and now our Auto-Scaling policy comes into action here i.e. it launches an extra CPU instance :-

Please note that, we had max-capacity as 2, by now into our ASG :-

We can observe that, finally we have 2 EC2 instances working now :-

References :-

Deep dive into AWS for developers | Part2 — Scalability

Written by aditya goel