Tag Archives: SRE

DevOps and Service Level Objectives

Anyone ever worked in IT or ICT industry knows about – or at least heard of – SLAs; Service Level Agreements. A service level agreement is a commitment between a service provider and a client. Particular aspects of the service – quality, availability, responsiveness – are agreed between the service provider and the service user. The user could be either external customer or other internal teams of the same organisation. However, more formal SLAs with penalties are common for customers, internal users need to know what level of service they can expect for any service provided by other teams.

Service provider teams must set clear Service Level Objectives (SLO) to be able to commit to SLAs based on them. An SLO is a target value or range of values for a service level that is measured by a Service Level Indicator (SLI). SLI is a carefully defined quantitative measure of some aspect of the level of service that is provided. Some common examples of SLIs are ‘request latency’, ‘error rate’ and ‘system throughput’.

Setting right SLO might be complicated. While choosing a specific SLI or set of SLIs might be easy, deciding which one or what combination of SLIs might be the best option for the SLO is tricky. Remember that we set SLO to be able to commit to a certain level of service and final judgement here is with the user of the service. No matter how we choose SLIs for the SLO, if the user is not getting what is expected, SLO would be useless. So we need to look at SLOs from user’s point of view.

SLAs are tied to business goals so normally DevOps are not responsible to provide them but since SLOs are needed to provide SLAs and SLOs are based on SLIs which falls under Monitoring and Alerting tasks of DevOps, they normally get involved in helping to avoid triggering the consequences of missed SLOs. They can also help to define the SLIs: there obviously needs to be an objective way to measure the SLOs in the agreement, or disagreements will arise.

But what about internal services? DevOps have to make sure they are setting accurate expectations for internal teams regarding services they provide by defining correct SLOs and by helping other service provider teams to do so as well. After all, they considered Monitoring and Alerting experts unless there is some specialist team for Monitoring and Alerting.

There are different resources about how to choose best SLIs for different SLOs, so I will not go through details of setting SLIs for any specific SLO. Just don’t forget to try to define SLO as close as possible to the user’s perspective.

DevOps Culture and SRE Mindset

DevOps definition has been changed from the time it was coined a decade ago. It was defined by the famous clash between Dev teams and Ops teams but these days it seems most Ops teams work in harmony with Dev teams thanks to DevOps culture. It seems that the reason behind naming it DevOps is not that common these days. DevOps, made DevOps name unfit.

Some suggest a different name like Reliability Engineering to opt-out Dev and Ops clash and replace the cause with new definitions like the “Error Budget”. In Google’s SRE book, error budget is one minus the availability target. A service that’s 99.99% available is 0.01% unavailable. That permitted 0.01% unavailability is the service’s error budget. Error budget can resolve Dev and Ops structural conflict cause they can discuss and reach an agreement about how to spend this budget.

There are other changes too; For many companies, Ops definition has been changed during the last few years. Many operational tasks shifted more toward Cloud infra and as a result, part of Ops team daily work offloaded to Cloud infrastructure and service provides. Practising DevOps methods and Software Reliability Engineering (SRE) principles is becoming the daily job of any Ops team which is moving to cloud. In many cases, team names changed to Infrastructure Team or Cloud Services Team or just simply DevOps Team.

So DevOps Culture and SRE mindset are not just some cool ideas to checkout but real concepts that teams need to work with every single day. But what is DevOps culture? How we define SRE mindset? And what has been changed in DevOps practises in the last few years?

I believe the original ingredients of DevOps didn’t change that much; Culture, Automation, Lean, Measurement and Sharing. While all four other pillars are pretty important, they would be useless without culture.

Culture is common or accepted ideas, customs and social behaviour of a particular society and in this case organisation. It acts like the glue which holds everything together. Without DevOps culture, you might have bits and pieces but there are not going to work together.

So let’s go back to the original question; What is DevOps culture anyway?

I believe the keyword to DevOps culture is collaboration. As soon as different teams in the company and each team member inside a team, start collaborating with others, either for improving automation or lean development or suggesting new measurement methods for monitoring and metrics analysis, or sharing ideas, findings and experiences with others, DevOps culture starts showing its effects. Changing service availability approach from avoiding failures to concepts like “Error Budget” can help companies to accelerate DevOps culture.

Culture is about people and DevOps culture needs people to believe in it to work. From each team member to different levels of organisational management.