Monitoring Kubernetes with Ilan Rabinovitch

Monitoring a Kubernetes cluster allows operators to track the resource utilization of the containers within that cluster. In today’s episode, Ilan Rabinovitch joins the show to explore the different options for setting up monitoring, and some common design patterns around Kubernetes logging and metrics gathering.

Ilan is the VP of product and community at Datadog. Earlier in his career, Ilan spent much of his time working with Linux and taking part in the Linux community. We discussed the similarities and differences between the evolution of Linux and that of Kubernetes.

In previous episodes, we have explored some common open source solutions for monitoring Kubernetes–including Prometheus and the EFK stack. Since Ilan works at Datadog, we explored how hosted solutions compare to self-managed monitoring. We also talked about how to assess different hosted solutions–such as those from a large cloud provider like AWS versus vendors that are specifically focused on monitoring. Full disclosure: Datadog is a sponsor of Software Engineering Daily.

Show Notes

8 Surprising Facts About Real Docker Adoption – Datadog


Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.


Failure is unpredictable. You don’t know when your system will break, but you know it will happen. Gremlin prepares for these outages. We provide resilience as a service, using chaos engineering techniques pioneered at Netflix and Amazon. Prepare your team for disaster by proactively testing failure scenarios. Max out CPU, blackhole or slow down network traffic to a dependency, terminate processes and hosts. Each of these show you how your system reacts, allowing you to harden things before a production incident. Check out Gremlin and get a free demo by going to

Datadog integrates seamlessly with container technologies like Docker and Kubernetes, so you can monitor your entire container cluster in real time. See across all your servers, containers, apps, and services in one place, with powerful visualizations, sophisticated alerting, distributed tracing and APM. And now, Datadog has Application Performance Monitoring for Java. Start monitoring your microservices today with a free trial! As a bonus, Datadog will send you a free T-shirt. Visit to get started.

Speaking of reliability, do you find yourself worrying about system downtime or missing an alert while you’re on-call? If so, VictorOps is THE incident management tool you need. VictorOps integrates with a large number of the monitoring, alerting, and messaging tools you already have in place to help your DevOps teams communicate better, diagnose incidents, and resolve any problems that come up. All in one place, on both your smartphone and your computer, you can view highly contextual, detailed alerts that will help your on-call engineers to understand and respond to incidents more quickly and effectively. Head to to see how VictorOps can help you. Be victorious with VictorOps!

GoCD is a continuous delivery tool created by ThoughtWorks. GoCD agents use Kubernetes to scale as needed. Check out and learn about how you can get started. GoCD was built with the learnings of the ThoughtWorks engineering team, who have talked about building the product in previous episodes of Software Engineering Daily. It’s great to see the continued progress on GoCD with the new Kubernetes integrations–and you can check it out for yourself at



Software Weekly

Software Weekly

Subscribe to Software Weekly, a curated weekly newsletter featuring the best and newest from the software engineering community.