Infrastructure Monitoring with Mark Carter

At Google, the job of a site reliability engineer involves building tools to automate infrastructure operations. If a server crashes, there is automation in place to create a new server. If a service starts to receive a high load of traffic, there is automation in place to scale up the instances of that service.

In order to create an automated response to an infrastructure problem, a site reliability engineer needs insights into that infrastructure. Every service needs tools around monitoring, alerting, debugging, and distributed tracing.

One benefit of working at a large company like Google is that an engineer building a new product gets this kind of tooling by default. If I am hacking on a project at home, I have to set up all kinds of tools to help me diagnose and resolve problems. Setting up this tooling takes time, and requires expertise.

Stackdriver is a set of tools and instrumentation that allows developers to monitor, debug, and inspect infrastructure. Stackdriver is based on the internal observability tools built for Google. Mark Carter is a group product manager at Google, and he joins the show to discuss site reliability engineering and the creation of Stackdriver.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.


Sponsors

Hired is a career marketplace that intelligently matches tech talent with the world’s most innovative companies. We combine cutting-edge technology with unbiased career coaching so both talent and employers can find the right fit, faster. We are on a mission to find everyone a job they love. Go to hired.com/sedaily, and get $600 free, if you find a job through Hired.

DoiT International helps startups optimize the costs of their workloads across Google Cloud and AWS, so that they can spend more time building new software–and less time reducing cost. DoiT International helps clients optimize their costs–and if your cloud bill is over $10,000 per month, you can get a free cost-optimization assessment by going to doit-intl.com/sedaily.

Azure Container Service simplifies the deployment, management and operations of Kubernetes. Check out the Azure Container Service at aka.ms/sedaily.

GoCD is a continuous delivery tool created by ThoughtWorks. It’s great to see the continued progress on GoCD with the new Kubernetes integrations–and you can check it out for yourself at gocd.org/sedaily.