Notebooks at Netflix with Matthew Seal

Netflix has petabytes of data and thousands of workloads running across that data every day. These workloads generate movie recommendations for users, create dashboards for data analysts to study, and reshape data in ETL jobs, to make it more accessible across the organization.

Over the last ten years, data engineering has become a key component of what makes Netflix successful. There are many different engineering roles who interact with the data infrastructure–including data analyst, machine learning scientist, analytics engineer, and software engineer.

Data engineering at Netflix has come a long way from the days of Hadoop MapReduce jobs running nightly, and generating reports of the most popular movies.

As data engineering and data science has grown, the tooling has expanded. The people in different data roles at Netflix might use Apache Spark, Presto, Python, Scala, SQL, and many other applications to study data–but in recent years, there is one tool that has stood out for its ability to be distinctly useful: Jupyter Notebooks.

A Jupyter Notebook lets users create and share documents that contain live code, visualizations, documentation, and many other types of components. In some ways, it is like a shareable IDE, that allows other people to see how you are working with your code and why you are making certain decisions. It is also a tool for building interactive, user-friendly applications–you can embed videos and images in a Jupyter notebook.

A Jupyter Notebook stores both the code and the results together in one place. By combining code with results in one document, you can have context around why a certain result came out the way it did.

Matthew Seal is a senior software engineer at Netflix, where he builds infrastructure and internal tools around Jupyter Notebooks. He joins the show to explain what problems Jupyter Notebooks solve for Netflix, and why they have quickly grown in popularity within the company.

 

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

 

Sponsors

HPE OneView integrates compute, storage, and networking resources across your data center and leverages a unified API to enable IT to manage infrastructure as code. Deploy infrastructure faster; simplify life cycle maintenance for your servers; give IT the ability to deliver infrastructure to developers as a service like the public cloud. Go to softwareengineeringdaily.com/hpeand learn about how HPE OneView can improve your infrastructure operations.

 

OpenShift is a Kubernetes platform from Red Hat. OpenShift takes the Kubernetes container orchestration system and adds features that let you build software more quickly. OpenShift includes service discovery, CI/CD, built-in monitoring and health management, and scalability. With OpenShift, you avoid getting locked into any particular cloud provider. Check out OpenShift from RedHat, by going to softwareengineeringdaily.com/redhat.

 

IBM Developer is a community of developers learning how to build entire applications with AI, containers, blockchains, serverless functions, and anything else you might want to learn about. Go to softwareengineeringdaily.com/ibm, and join the IBM Developer community.

 

FullStory is offering a free 1 month trial at fullstory.com/sedaily to Software Engineering Daily listeners. This free trial doubles the regular 14-day trial available from fullstory.com, giving you time to test FullStory’s powerful search and session replay and even try out FullStory’s many integrations (Jira, Bugsnag, Trello, Intercom, and more).

Software Daily

Software Daily

 
Subscribe to Software Daily, a curated newsletter featuring the best and newest from the software engineering community.