Podcast: Play in new window | Download
Google’s site reliability engineers are responsible for maintaining the highly available services that power the Google software that we all use on a regular basis. O’Reilly recently published the book “Site Reliability Engineering: How Google Runs Production Systems”, and the book provides a comprehensive window into how the site reliability engineering role works.
Todd Underwood is a director of site reliability engineering. On today’s episode, Todd explains how the role of a SRE relates to devops. We discuss the relationship between the engineers who are developing Google services, and the SREs who are maintaining it. Google’s internal data center operating system “Borg” is also discussed.