PRO TALK: How to Run Smarter in Production: Getting Started with Site Reliability Engineering

- PDT
Dev Innovation Summit Main Stage
Join on Hopin

Jennifer Petoff
Google Ireland, Director, Site Reliability Engineering Education

Jennifer Petoff is Google's Global Director of SRE Education and is based in Dublin, Ireland. She is one of the co-editors of the best-selling book, "Site Reliability Engineering: How Google Runs Production Systems" and lead author of "Training Site Reliability Engineers: What Your Organization Needs to Create a Learning Program".


Site Reliability Engineering and the DevOps movement share a similar set of challenges but addresses each in a different way. SRE got its start at Google in 2003 and according to Ben Treynor, VP of 24/7 Operations: ”SRE is what happens when you ask a software engineer to design an operations team”. In 2016, Google published a book about Site Reliability Engineering principles, practices and organizational constructs.

The practice of Site Reliability Engineering at Google encompasses more than just managing production systems and responding to emergencies. Applying software engineering in a principled way to operations allows SRE to holistically address the reliability of software applications across the product lifecycle.

Implementing SRE in an organization requires a commitment to supporting some core principles and a fundamental culture shift -SRE needs Service Level Objectives, with consequences.
-SREs have time to make tomorrow better than today.
-SRE teams have the ability to regulate their workload.
-SREs and the organization’s leaders remove the word ‘blame’ from their vocabulary.

This talk will highlight key SRE principles and how they map to recognized DevOps focus areas. We’ll also discuss how any organization can adopt SRE, and how our recent experience of working with our customers on implementing SRE practices has shown these principles will work across a range of organizations of different types and sizes.