Netflix Observability with Kevin Lew

Netflix users stream terabytes of data from the cloud to their devices every day. During a high bandwidth, long-lived connection, a lot can go wrong. Networks can drop packets, machines can run out of memory, and the Netflix app on a user’s device can have a bug. All of these events can result in a bad user experience.

Other errors can occur that do not disrupt the user experience. Netflix runs thousands of machine learning jobs, logging servers, and other pieces of internal infrastructure. Customer service dashboards, CI/CD pipelines, and AB testing frameworks are all software built by Netflix–and when an error occurs in any of these places, engineers need to be able to diagnose and debug that error.

Observability is the practice of using logs, monitoring, metrics, and distributed tracing to understand how a system is working. Kevin Lew is a senior software engineer at Netflix with the Edge Insights team. He joins the show to talk about adding observability across the microservices deployed at Netflix. We also talk about how to manage high volumes of logging data effectively using stream processing.


Show Notes


Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.


Digital Ocean is the easiest cloud platform to run and scale your application. Try it out today and get a free $100 credit–go to Digital Ocean is a complete cloud platform to help developers and teams save time when running and scaling their applications.

Data holds an incredible amount of value. But extracting value from data is difficult. Especially for non-technical, non-analyst users. As software builders, you have a unique opportunity to unlock the value of data to users through your product or service. Jaspersoft offers embeddable reports, dashboards, and data visualizations that developers love. Give users intuitive access to data in the ideal place for them to take action—within your application. To check out Jaspersoft, go to and find out how easy it is to embed reporting and analytics into your application.

Datadog is a cloud-scale monitoring platform for infrastructure and applications. And with Datadog’s new Live Container view, you can see every container’s health, resource consumption, and running processes in real time. See for yourself by starting a free trial and get a free Datadog T-shirt!

Software Weekly

Software Weekly

Subscribe to Software Weekly, a curated weekly newsletter featuring the best and newest from the software engineering community.