Netflix explains how it avoids outages after a major incident in 2019

Jon Fingas
·Associate Editor
·2 min read

Have you noticed that Netflix hasn’t had any catastrophic outages like it did in years past, even with a surge of viewing during the pandemic? It’s not a happy accident. Netflix has detailed the implementation of technology, priority-based progressive load shedding, that should keep streams flowing even when there are serious failures behind the scenes. It sounds complex, but it ultimately amounts to deciding what the service can afford to sacrifice without people noticing.

The service now prioritizes traffic based on how much you need it for playback, including “non critical” items you’ll never see, “degraded experience” items that won’t affect playback (think pause markers and viewing history) and, of course, the critical viewing experience itself. A gateway service, affectionally named Zuul, determines when a back-end service or even Zuul itself is in trouble and will progressively drop traffic to keep Netflix running, starting with the lowest-priority items

The approach scales throttling in such a way that you shouldn’t notice much if anything unless the situation is truly dire. At that point, Zuul is likely throttling heavily to protect both the service and itself — there’s a total outage if Zuul stops working.

You’ve already witnessed this system in action without realizing it. Netflix encountered a problem in 2020 that was similar to the one behind a 2019 outage, but the load shedding worked well enough that playback kept working even while the system was recovering from a failure. The company is planning to fine-tune its approach further, such as dynamic thresholds for when throttling kicks in.

This doesn’t guarantee that Netflix will avoid all outages or other degradations going forward. After all, the company did reduce bitrates to avoid clogging networks when pandemic lockdowns began. These failures should be less likely, though, and might not last as long if they occur. That’s particularly crucial when there’s an abundance of rivals. If Netflix ran into too many outages, it risked losing subscribers who’d be willing to give up a few exclusive shows if it meant more reliable streams.