Google’s Site Reliability Engineering with Todd Underwood

Google’s site reliability engineers are responsible for maintaining the highly available services that power the Google software that we all use on a regular basis. O’Reilly recently published the book “Site Reliability Engineering: How Google Runs Production Systems”, and the book provides a comprehensive window into how the site reliability engineering role works.
Todd Underwood is a director of site reliability engineering. On today’s episode, Todd explains how the role of a SRE relates to devops. We discuss the relationship between the engineers who are developing Google services, and the SREs who are maintaining it. Google’s internal data center operating system “Borg” is also discussed.

Sponsors

honey-logo Honeybadger notifies you about errors and outages in your web application, and gives you the information you need to fix them. If your app develops a problem, Honeybadger is always watching. SE Daily listeners can try Honeybadger free for 15 days.
wealthfront-logo Wealthfront is the automated investment service that manages your investments online. Check out wealthfront.com/sedaily to get your first $15,000 managed for free, as a listener of Software Engineering Daily.

Comments