SLO vs SLA vs SLI

Natarajan Santhosh
2 min readJan 6, 2021

--

The 4 golden signals of monitoring

Set SLA at the minimum customers are not repelled by your service
If SLA states that service will respond in under 300ms, then in SLO we want to say that response will be returned in under 200ms

SLA — Service Level Agreement

Promises made to a customer and signed as agreement. Set the promises which will pass the happiness test

Breaches are costly, as they usually have monetary impact e.g. refund and/or service credits

SLO — Service Level Objectives

System reliability vs development velocity

Balance the risk to reliability from changing a system with the requirement to build new cool features for that system

Measuring SLO performance gives a real-time indication of the reliability cost of new features

If everyone agrees the SLO represents the point at which you are no longer meeting the expectations of your users, then broadly speaking, being well within SLO is a signal that you can move faster without causing those users pain.

Conversely, burning most or, in the worst cases, multiples of your error budget, means you have to lift your foot off the accelerator. You can plan proactively by estimating risks to your reliability from the roll-out of new features in terms of time to detection, time to resolution, and impact percentage.

SLI — Service Level Indicators

properties of good SLI metrics

If we can find a way of quantifying the website does not load or this website is too slow from our monitoring data, we can use this data to approximate how happy or unhappy our users are in aggregate. These quantifiable metrics become our SLIs. Ideally, you wanted to define SLIs that have a predictable, mostly linear relationship with happiness of your users. The predictability of the relationship is crucial because you’ll be making important engineering decisions based on this data.

top one: false positives
ways to measure SLI

--

--