Q&A: Hulu’s chief architect talks scaling the platform

Drapers

Hulu is growing fast and bringing in lots of subscribers, making it critical that the streaming video platform can effectively scale to meet the increasing demand.

Hulu has more than 30 million subscribers, with 27.2 million subscribers on the SVOD service and another 3.2 million subscribers one the SVOD plus live TV service. Disney, which last year assumed full operational control of the company, expects Hulu to grow exponentially over the next four years.

Hulu is also arriving on new platforms where it can potentially attract new users. Most recently, the Hulu app showed up on Comcast’s Xfinity X1 and Flex platforms, and the PlayStation 4.

Gert Drapers, chief architect at Hulu, is in charge of making sure the platform can reliably deliver the experience and content customers want as the audience grows. He recently told FierceVideo about the team’s microservices approach and what needs to happen to ensure Hulu’s platform can welcome the next 30 million subscribers.

FierceVideo: How is Hulu focusing on its engineering platform and solutions to make sure they can scale to meet the needs of the business?

Gert Drapers: Hulu has grown rapidly, with 30.7 million subscribers as last reported in February. With more consumers than ever turning to streaming, scaling our systems is critical to ensure our subscribers are able to reliably access and watch content on our platform.

In order to achieve this, our team has been focused on standardizing our architecture to meet business needs. We standardize to improve operational excellence, challenge teams to validate their designs against standards, and enforce simplicity to further increase resilience and scale of the services operating closest to users.

Our team adopted a microservices philosophy so that we can iterate faster and prevent engineers from worrying about dependencies. Services must be isolated from each other through well-defined and documented service interfaces and APIs in order to enable the independent evolution of services. Without this isolation, you are then put in a situation where you’ll need to either consult several teams or update other services each time you want to deploy a change to a service, as an update to one service can affect another. Ultimately, by practicing service isolation, we are limiting the cascading effect of one service to other dependent services.

The scaling of our engineering platform and solutions requires the ability to iterate quickly, which is aided by isolating services. We have the lowest cycle times, which is enabled by our build and deployment process.

Standard build and deployment not only guarantee the lowest possible cycle time for developers — with the highest level of guaranteed quality gates — but also allows our teams to grow faster. It standardizes the entire flow, providing a safety net for developers. We also have a set of standard metrics that we use for each service, allowing us to determine the number and types of errors as well as how fast our team is responding.

FierceVideo: What are Hulu’s priorities in terms of renewing its technology stack?

Drapers: With regards to renewing our technology stack, we have two main priorities that we put into consideration: Keeping our technology stack current and measuring long-term viability of a new stack.

The key is to ensure our systems are maintainable and operable. We are constantly renewing because we must keep the automated builds running at all times, or else our teams can’t work on them.

If a service becomes stale and not actively deprecated, the change of security risks increases exponentially. A basic requirement of staying current with your builds is so that you’re able to address any security vulnerabilities and are always ready to deploy.

We do acknowledge that there are new technologies that might advance or provide answers to some existing problems. My team approaches these new developments on an as-needed basis, evaluating for performance, scale, challenges, and ability for growth. In the case there is a new functionality that doesn’t fit within our current system, we redesign existing code in order to have a successful implementation. Adoption of new technologies is reliant on what fits within our overall system and if the transition appears to be feasible long-term with minimal disruption.

Our second priority looks at evaluating the long-term viability of the maintainability and operation capabilities of new technology stacks. For example, if we’re bringing in new programming languages, you have to ask yourself if there is a deep enough talent pool in order to implement and continually grow.

FierceVideo: Disney expects Hulu will grow to between 40 million and 60 million subscribers by 2024. What needs to happen to ensure the platform can handle that potential volume?

Drapers: The three significant workflows on our platform that our team must understand and prioritize for this growth are: browsing for content, watching either on-demand or live video, and playing ads.

These activities are largely why my team has adopted the microservices philosophy — the resulting service isolation enables us to respond and adjust systems without causing a huge ripple effect throughout services that are not part of these workflows.

As we continue adding new subscribers, user interactions — such as browsing and discovering content — become uniquely varied, thus increasing load.  We intentionally designed the system to absorb this load as close to the user as possible to prevent it from getting amplified downwards into the system.

With Live TV, there are spikes you need to be prepared for, which is why it’s vital for our team to stay elastic. For larger events where we know viewership will significantly increase, such as the Super Bowl, we are able to observe our scale model to understand and determine our threshold. It’s important to build in that elasticity into your architecture, while walking that thin line of ensuring you’re not either over or under prepared.

This requires us to fully understand the call-flow of the system and its dependencies. Our team relies on the health of each of our services in order to properly scale and respond to any anomalies that may pop up.

FierceVideo: Is Hulu’s live TV service expected to grow at a parallel rate? If so, what’s the plan to scale that service and ensure that it can deal with the more unpredictable spikes in usage?

Drapers: While we consider on-demand video and live streaming as two distinct workloads with overlapping capabilities, we still need to separately scale workloads to accommodate for different system behaviors.

With on-demand, spikes come from an increase in subscribers as that increases load. With live video, we have the challenge of needing to be prepared for unpredictable events such as breaking news.

Our engineers have been preparing through live-load tests and war rooms, as well as finding small time windows to deploy changes.

FierceVideo: As Hulu’s business evolves, how does the company’s thinking about streaming platform architecture need to evolve with it?

Drapers: As Hulu and other streaming platforms continue to evolve, there are several aspects we have to focus on, such as video quality and the continued growth of mobile devices.

There is an ever-growing spectrum of standards focused on how to encode and display videos (i.e. 4K, Dolby, etc.) that is always a moving target for streaming platforms such as ours. And as we look ahead towards international, we also need to restructure our focus on international audio streams and subtitles.

Our viewers spend an increasing amount of time watching content through their mobile phones. As more and more devices become mobile in the coming years, our team needs to be prepared to deal with the different dimensions of mobile networks, which are different in terms of bandwidth, reliability, and behavior to those of wireless networks.

All of these developments underscore the importance of needing to scale things separately. As behavior patterns differ between on-demand video and live streaming, my team needs to be able to iterate faster and ensure that we’re able to do so with minimal disruption to our viewers.