Wednesday, August 18, 2021
Site Reliability Engineering and the DevOps movement share a similar set of challenges but addresses each in a different way. SRE got its start at Google in 2003 and according to Ben Treynor, VP of 24/7 Operations: ”SRE is what happens when you ask a software engineer to design an operations team”. In 2016, Google published a book about Site Reliability Engineering principles, practices and organizational constructs.
The practice of Site Reliability Engineering at Google encompasses more than just managing production systems and responding to emergencies. Applying software engineering in a principled way to operations allows SRE to holistically address the reliability of software applications across the product lifecycle.
Implementing SRE in an organization requires a commitment to supporting some core principles and a fundamental culture shift -SRE needs Service Level Objectives, with consequences.
-SREs have time to make tomorrow better than today.
-SRE teams have the ability to regulate their workload.
-SREs and the organization’s leaders remove the word ‘blame’ from their vocabulary.
This talk will highlight key SRE principles and how they map to recognized DevOps focus areas. We’ll also discuss how any organization can adopt SRE, and how our recent experience of working with our customers on implementing SRE practices has shown these principles will work across a range of organizations of different types and sizes.