Live Streaming System Design || Part2

9 min readFeb 29, 2024

If you are landing here directly, please go through this Part-1 blog first.

Question → What’s the process of Live streaming ?

Answer → Below is the process of Live Streaming :-

Step #1.) Say someone wants to start a live-stream from his Phone, then my Phone would create the RTMPs Stream and then we shall connect to some POP i.e. Point Of Presence Server.

Step #2.) The POP would then forward the connection to a full data centre, where encoding can happen, in order to create different bitrates and different resolutions for a given stream.

Step #3.) From there, it goes to different POPs, so that it can go to the payback clients. Also, it goes to various 3rd party CDNs as well, in order to keep the overall-latency low.

Question → What are various resources required in case of Live streaming ?

Question → What are various challenges in case of Live streaming ?

Answer → Usually the protocols would be different at Ingestion side and delivery side. Following are the main challenges, that can come in Live Streaming design :-

Question → How Live streaming is different from the Youtube Videos OR Netflix Videos ?

Question → What are the various choices of Protocol for Broadcasting ?

Answer → Following are the protocols for broadcasting :-

1.) WebRTC → This is based upon the UDP transport layer protocol, which in turn is a Lossy protocol and hence this may not be right choice :-

Loss of Data → UDP packets can be lost or dropped during transmission, especially in congested or unreliable network conditions. Also, Unlike TCP, which guarantees delivery of packets and retransmits lost packets, UDP does not provide built-in reliability mechanisms. This can result in lost audio or video frames, leading to degraded quality or interruptions in the media stream.
Limited Browser Support → While most modern web browsers support WebRTC, there may still be some inconsistencies in implementation and feature support across different browsers and versions. Older browsers or devices may not support WebRTC at all, limiting the reach of WebRTC-based applications.
Network Limitations → WebRTC relies on peer-to-peer connections for communication, which may not be suitable for all network environments. Network address translation (NAT) and firewall configurations can sometimes interfere with the establishment of peer-to-peer connections, leading to connectivity issues for users behind restrictive network setups. WebRTC does include mechanisms for NAT traversal (such as ICE and TURN servers), but these mechanisms may not always be effective in all network scenarios.
Resource Consumption → WebRTC applications can consume significant system resources, especially when handling multiple concurrent connections or high-resolution media streams. Streaming and processing audio and video in real-time can put a strain on CPU, memory, and bandwidth resources, particularly on mobile devices or older hardware.

2.) RTMP → This protocol is built for Video Streaming and therefore it has right latency characteristics. It is widely used in Industry and therefore there are client libraries and server libraries which we can reuse.

Origin and History → RTMP was developed by Adobe for real-time streaming of audio, video, and data over the internet. While Adobe has deprecated RTMP in favor of newer protocols like HTTP-based streaming, some proprietary implementations of RTMP may still be used in specific applications or platforms.
Under the hoods → RTMP works on the top of TCP, which in turn is a lossless protocol which would make sure that, there is no loss of packets and hence video quality shall be maintained. The library-size for RTMP was also very less i.e. around 100 Kb.

Question → What are the various Encoding Properties ?

Answer → Following are the encoding properties :-

1.) Aspect-Ratio → Aspect ratio in video streaming refers to the proportional relationship between the width and height of a video frame. It’s an important consideration in video streaming because it affects how the video content is displayed on screens or devices.

Content creators and streaming platforms need to ensure that their videos are encoded with the correct aspect ratio to avoid distortion or letterboxing (black bars) when viewed on different devices with varying screen sizes and aspect ratios.
It defines the shape of the video frame and affects how the video content is displayed on screens or devices.
Aspect ratio is typically expressed as a ratio of width to height, such as 16:9, 4:3, or 1:1.
Aspect-Ratio of 1:1 means that, we create a square video frame, where the width is equal to the height. It is commonly used in social media platforms like Instagram and Facebook for square-format video posts.

2.) Video Codec → H.264 also known as Advanced Video Coding (AVC), is one of the most widely used video codecs. It provides high-quality compression for digital video and is used in a wide range of applications, including streaming, broadcasting, Blu-ray discs, and video conferencing.

3.) Video Codec → AAC (Advanced Audio Coding) is a widely used audio codec known for its high-quality compression and versatility. It is commonly used in applications like streaming, broadcasting, digital audio players, and mobile devices.

Question → Let’s talk about Video-Streaming-Ingestion-Service ?

Answer → Here is the process of Video-Streaming-Ingestion-Service :-

Step #1.) When client wants to stream a video live, they first connects to API-Server to get following three things :-

StreamId → Used in the consistent-hashing, so that a given client is always connected to the same POP-server, even in case he switches from cell to wifi or some other network.
Security-Tokens
URI → This is important for client in order to know, to which POP/DC to talk to ?

Step #2.) Broadcasting client streams the video and video-Stream reaches to the Point Of Presence Server. These are simple rack of servers whose primary job is as follows :-

2.1) Terminate Incoming Connections from Clients → It then connect to the DataCentres.

2.2) Caching of streams for the purpose of playback → On Ingestion side, Caching is not relevant.

Question → How do we solve the problem of Network-Connnectivity at POP-Ingestion-side ?

Step #3.) Video-Stream reaches to the Data-Centre Servers. These are pretty big in size, whose primary job is as follows :-

3.1) Terminate Incoming Connections from POPs → It terminates the connections.

3.2) Encoding Hosts → On Ingestion side, Encoding hosts performs following aspects :-

Authentication → They make sure, the stream is proper and formatted correctly.
Association → They associate host with stream.
Generate Encodings → Whatever quality the sender sends us, we generate multiple encodings of lower quality with lower bitrates, so that we can show video without jitter to the people who don’t have good connectivity.
Generate Playback Output → These are the manifest files which points to the video-segments.
Store Media for VOD → Encoding hosts are responsible for converting the streams into videos and storing them into our long term storage.

Question → How does Playback of Videos happens ?

Answer → It happens through MPEG-DASH protocol.

DASH is a streaming protocol over Http. It consists of 2 files i.e. Manifest-file and Media-files.
Manifest-files is a table of contents which points to media files.
As the stream gets created, server creates 1 second segments and updates the manifest file to point to these new segments.

Question → Explain the process of playback of a video for a client based in India, step by step ?

Answer → The entire process at the Playback side, whenever some client requests for the first time is as follows :-

Step #1.) Playback-Client first connects to closest POP, in order get the DASH Manifest file for a particular stream.

Step #2.) POP-Servers checks it’s local cache, whether something is available ?

Step #3.) POP-Server then requests for DASH based manifest file from the Data-Centre.

Step #4.) Data-Centre-Server now inquires from Encoding-Hosts about the manifest file for a given streamId. At the same time, Data-Centre-Server also populates it’s own cache and returns back the manifest file to the POP-Server.

Step #5.) POP-Server then populates their own cache with the manifest file and then returns it back to the client.

Question → What happens in the case where a new playback-client comes-in from India ?

Answer → Now imagine, some other playback-client comes-in which happens to connect to the same POP server.

In that case, he shall be served the manifest file from the POP-Server’s Cache itself.
The calls didn’t go to the Data-Centre at all. With this approach, we can solve the problem of scalability reasonably. Everyone who happens to connect to same POP (everyone who is in the same geographical-area) doesn’t needs to go to the Data-Centres.

Question → What happens in the case where a new playback-client comes-in from USA ?

Answer → Now imagine, some other playback-client comes-in which happens to connect to the different POP server because this client is based in a different geographical location :-

Step #1.) Playback-Client first connects to closest POP, in order get the DASH Manifest file for a particular stream.

Step #2.) POP-Servers checks it’s local cache, whether something is available ?

Step #3.) POP-Server then requests for DASH based manifest file from the Data-Centre.

Step #4.) Data-Centre-Server now inquires from it’s own Local-Cache about the manifest file for a given streamId and returns back the manifest file to the POP-Server.

Step #5.) Once the POP-Server receives the manifest file, it puts it into it’s own Cache as well.

Important Notes :-

Roughly, the number of calls that a Data-Centre receives is equal to the number of POP-Servers that we have.
Another important aspect is that, we need to periodically update the manifest file at the DC-Servers as well as POP-Servers because in case of stream, the video is being generated then & there itself. This can happen via Http-Push.

Question → Explain the problem of Thundering Herd ?

Answer → Let’s say that, a particular stream is popular and bunch of clients asks to their closest POP-Server.

Now, this POP-Server would in-turn connects to it’s BigCache, to check for the Manifest file.
Since for the first time, Cache doesn’t contains anything, only one call shall go to DC. We would not allow all the calls to land into DC with the help of Cache-Wait-Time. During this time, all the clients would wait for the Cache to be populated.
Say suppose, in the first call the DC didn’t returned the Manifest File. By now, all the clients which were waiting for the cache to be populated, they all ends up connecting to the DC ad DC gets fired-up. This situation is called as Thundering-Herd Problem.

Therefore, we need to tune/optimise the Cache-Waiting-Time, so that it’s not too high as well as it’s not too low too.

That’s all in this blog. We shall see you in next blog.

Live Streaming System Design || Part2

Written by aditya goel