Home (current)
Services
About us
Insights
Case Studies
CONTACT US

SHARE

While service disruptions and downtime are inevitable when it comes to the transmission of data over the Internet, some sectors are much harder hit and can't afford the inconvenience associated with extended outages. Along with service providers in the security and financial industries, video-on-demand or streaming services like Netflix, Showmax and Discovery+ have to be up and running 24/7. Any downtime or issues can lead to significant inconvenience and significant business losses.

 

As the popularity of these services continues to grow across the globe, ever-increasing user numbers constantly pose new challenges – or "opportunities for improvement", as we prefer to call them. So, when Discovery+ enlisted our help in 2020 to optimise their VOD platform and ensure an all-around smoother user experience, we were happy to take on the challenge.

 

How We Work

As external experts, our primary goal is to support our clients and provide them with expertise on specific problems. So when Discovery+ identified the need for improvement in their streaming service, we got to work figuring out where, how, and when it could be ready to deliver the most value to them and their users.

 

From investigating, designing and implementing the changes required to ensure better functionality, fewer issues and happier viewers, the basic steps we followed on this project were:

  1. Identifying the problem
  2. Proposing a solution to the relevant business units
  3. Developing the solution
  4. Implementing the solution (and rolling it out to other regions or areas of the business)

 

Below we will briefly highlight our essential findings and demonstrate how we could improve the experience of Discovery+ users in the European, Asian, Middle Eastern and African markets.

 

The Problem: Delayed and Impaired Observability

The amount of data generated and captured by streaming platforms like Discovery+ is mind-boggling. Different user experience Metrics generate in a real-time fashion for every user resulting in thousands of metric records ingested into the system, allowing us to view things like the rebuffer rate for a region at a point in time. Given that data availability was not a problem, we decided to explore how we could convert that data into a valuable asset that supports the business and improves the experience for the user.

 

Data that is shelved is ultimately a waste, so when exploring our options, we discovered two hurdles that prevented us from using the data to optimise the user experience:

 

 

 

The Significance of Observability for VOD

Before moving on to how we solved these issues, it's crucial to highlight observability, which allows teams to gain access to what is happening inside a system based on the external data exposed by that system. Observability is thus crucial for VOD services, as any disruptions can have far-reaching consequences.

For example, in 2020, Discovery+ was granted the licence to stream the Tokyo Olympics live – a right that came with stringent conditions, including fines for streaming issues or service interruptions. In addition to the added pressure of these strict SLAs, the streaming of live events like the Olympics often see a sudden spike in viewers for a specific time or event, for example, the Men's 100 m sprint. As DevOps engineers, we had to ensure the system could handle such increases as and when they occurred.

 

Even when fines aren't being imposed on service providers, regular outages or excessive buffering will inevitably cause frustrated customers to seek alternative VOD providers, once again leading to a loss in revenue for the current provider.

 

The Solution: Evolving from API to Real-Time Monitoring

Having established that the goal would be to ensure Discovery+ collects the correct data in the valid format at the proper time, we got the go-ahead to find a solution. Discovery+ is an AWS organisation, so AWS is used internally, with additional third-party tools to assist with various functions. Given the number of deployment, monitoring and reporting tools available, it was up to us to match the right tools to the task at hand.

 

Based on our analysis and in-depth investigation of the previous and current systems used by Discovery+, the following tools were selected:

AWS: Kinesis streams, Kinesis Delivery streams, Lambda, DyanmoDB

Datadog: (used for visualization and monitoring of data)

 

By collaborating closely with third-party provider MUX, we moved away from the previous method of gathering data via API and implemented a new solution that pushes the data into Amazon Kinesis streams for real-time monitoring.

 

View a high-level design of the system below:

As part of our goal to ensure improved observability, we also took on the configuration of monitors on Datadog to help the system pick up sudden changes, like an instant of a significant drop in users, and send an alert to the relevant business or support team that can assist with it.

 

 

The Result: More Advanced and Realistic monitoring

The benefits of a fully automated observability platform include significantly cutting down on reaction times when things go wrong and anticipating when issues might occur to fix these proactively.

 

Another major success of automating many of our processes was our ability to move from a development environment to production more quickly and safely. By having the significant components of the systems written in an automated way using CloudFormation and serverless tools, the process, which usually takes several days to complete, was cut down to a single day, allowing us to add additional environments for monitoring quickly. Speeding up this process enables us to get new products and features to the market faster, which is a win for the user, the business and ultimately for us, the developers.

 

The Bottom Line

In the highly competitive VOD market, unnecessary (and entirely avoidable) downtime means losing customers, which is why automation and system optimisation are essential in this industry.

 

We initially optimised the Discovery+ platform to ensure an all-around improved user experience. In the end, we were able to do this by modernising the monitoring application and redesigning it to reflect the real world in real-time – what we like to call absolute monitoring service.

 

The business feedback on our efforts and the implementation of automated systems across Europe, Asia, Middle East and Africa has been exceedingly positive. As a result, our improvements are being rolled out to other teams globally, starting with the US.

 

For more information on the successes of our High-Performance OTT DevOps Teams, check out our Discovery+ Case Study here.

;