Real-time data pipelines with kafka :- Let’s first see how the Kafka is used in setting up real-time data-pipelines.

  • First, we have our multiple data-sources and we want to onboard them to Kafka. We have 2 options here to go with i.e. Either Kafka producers OR Kafka source connectors.
  • Next, we might have to perform some processing on the data being received in kafka-topics. We could use Streams processing application which shall consume from kafka topics, perform processing and again puts the data back into kafka topics.
  • Next, After processing, we might want to dump the processed data back onto…


Welcome readers. In case you have directly landed here, I strongly suggest you to go back and read through this link first.

Introduction to the problem :- In this blog, I would like to help you guys to build a Machine Learning model based on the Decision Tree Algorithm. Here, we shall be working on a smaller dataset (taken from archive). We shall first be training our model using the given data and then shall be performing the Multi-class classification using the built model.

Let’s begin by exploring the data-set first. Please note that, there are 4 independent variables and…


Welcome readers.

Introduction to the problem :- In this blog, I would like to help you guys to build a Machine Learning model based on the Decision Tree Algorithm. Here, we shall be working on a smaller dataset of diabetic people. We shall first be training our model using the given data and then shall be performing the Binary classification using the built model.

Fundamentals :- Here our main agenda is to identify, which is going to be the root-node and what would be our splitting criteria for further level-nodes. We can use, thus formed Decision-Tree (If-else based rule-engine) in…


Introduction to the problem :-

In this blog, we would work with one of the popular data-set i.e. of LendingClub. Its a US peer-to-peer lending company, headquartered in San Francisco, California. It was the first peer-to-peer lender to register its offerings as securities with the Securities and Exchange Commission (SEC), and to offer loan trading on a secondary market. LendingClub is the world’s largest peer-to-peer lending platform.

Objective of the Blog :-

In this blog, Given historical data on loans given out with information on whether or not the borrower defaulted (charge-off), we shall be building a model that can…


In case, you are landing here directly, it would be recommended to visit this page. In this section, we would deep dive to learn about AWS-S3.

Amazon S3 is an “infinitely-scaling” storage and therefore we don’t need to plan its storage size in advance. It is one of the building blocks of AWS cloud. Many websites uses Amazon S3 as its backbone. Many other AWS services also uses Amazon S3 as an integration-component as well. Amazon S3 allows people to store objects (files) in S3 buckets (directories). Buckets must have globally(Throughout all the accounts across the globe) unique names, but…


In case, you are landing here directly, it would be recommended to visit this page.

In this section, we would deep dive to learn about AWS-RDS & ElasticCache.

AWS RDS stands for Relational Database Service. Its a Managed Database-Service. It allows us to create databases in the cloud, that shall be managed by AWS. Following types of database-engines are supported using AWS RDS :-

  • POSTGRES
  • MYSQL
  • MARIADB
  • ORACLE
  • SQL-Server
  • Amazon AURORA (AWS Proprietary Database) → This is not compatible with Free-Tier.

Following are the advantages of using AWS RDS Vs Deploying RDS on EC2 :-

  • AWS Managed RDS takes care…

In case, you are landing here directly, it would be recommended to visit this page.

In this section, we would deep dive to learn about AWS ELB & Scalability aspects.

AWS Scalability: It means that, software system can handle higher loads by self-adaptation. We can either do Vertical OR Horizontal scaling.

  • Vertical-Scalability → It means increasing size of the given instance. For e.g. Say we have a system with CPU of 1 Ghz clock-rate and 2 GB RAM and its able to handle load of 50 TPS, but say if we increase its capacity to 4 GB RAM and 2…


AWS has been an revolutionary in powering the IT companies with least possible workforce. AWS is an cloud offering from Amazon, which helps its users to provide the servers and services on demand.

AWS Regions : AWS has regions all around the world. There are many upcoming regions, marked in orange colour in below map. A Region is nothing, but a cluster of data-centres. Many AWS services are region-scoped, i.e. if we use the same service in more than 1 regions, our data may not be replicated. Naming conventions used for regions are like “us-east-2”, “ap-east-1”, “eu-central-1”, etc.


First step to Pandas :- Anaconda comes with a lot of packages that you need for data science. In this course, we will also use Conda, a package manager. In case, you currently already have a version of pip or Conda installed, then you don’t need to reinstall everything. To complete a brand new installation of Anaconda, head over to www.cotinuum.io/downloads. Once the installation is done, the same can be opened through explorer :-


Kafka Streams Fundamentals :-

Kafka streams are built on top of Kafka client APIs. Kafka streams leverages the native capability of Kafka to offer data parallelism and distributed coordination and fault-tolerance. By default, the streams application runs as a single threaded application.

aditya goel

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store