11:40 - 12:40 (UTC+02)
Talk (60 min)
Building reliable services at NRK TV
NRK TV is a TV streaming service for the Norwegian public, serving hundreds of thousands of users every day. This service needs to be reliable, but what does that mean in practice? Best effort is insufficient: we need to be more specific. We must define what we mean by reliability in our context, and measure how well we are doing. This creates a crucial feedback loop: we can see how the changes we make affect our services. This in turn enables us to make strategic investments in reliability in the appropriate parts of the service, as well as make informed trade-offs between investing in reliability and new functionality. In practice, this means defining and monitoring service level objectives. The actual numbers we choose are not the most important part of this process. Instead, the main value comes from being explicit and making choices: distinguishing between critical and less critical services, defining what uptime means for a given service, what adverse events the service should be robust against, and so on and so forth. In this talk, we'll look at some practical examples of using SLOs, fault injection and load testing to improve the reliability of services at NRK TV.