Google Cluster Evolution with Brian Grant

Google’s central system for managing compute resources is called Borg. On Borg, millions of Linux containers process a wide variety of workloads. When a new application is spun up, Borg provides that application with the resources it needs.

Workloads at Google usually fall into one of two distinct categories: long-running application workloads (such as Gmail) and batch workloads (such as a MapReduce job). In the early days of Google, the long-lived workloads were scheduled onto a system called “BabySitter” and the batch workloads were scheduled onto a system called “Global Work Queue.”

Borg was the first cluster manager at Google designed to service both long-running and batch workloads from a single system. The second cluster manager at Google was Omega, a project that was created to improve the engineering behind Borg. The innovations of Omega improved efficiency and architecture of Borg.

More recently, Kubernetes was created as an open source implementation of the ideas pioneered in Borg and Omega. Google has also built a Kubernetes as a service offering that companies use to run their infrastructure in the same way that Google does.

Brian Grant is an engineer at Google who has seen the iteration of all three cluster management systems that have come out of Google. He joins the show to discuss how the workloads at Google have changed over time, and how his perspective on how to build and architect distributed systems has evolved. Full disclosure: Google is a sponsor of Software Engineering Daily.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Sponsors


Datadog integrates seamlessly with more than 200 technologies, including Kubernetes and Docker, so you can monitor your entire container cluster in one place. Datadog’s new Live Container view provides insights into your containers’ health, resource consumption, and deployment in real time. Filter to a specific Docker image, or drill down by Kubernetes service to get fine-grained visibility into your container infrastructure. Start monitoring your container workload today with a 14-day free trial, and Datadog will send you a free T-shirt! softwareengineeringdaily.com/datadog


Airtable is hiring creative engineers who believe in the importance of open-ended platforms that empower human creativity.Airtable is a uniquely challenging product to build, and they are looking for creative frontend and backend engineers to design systems on first principles— like a realtime sync layer, collaborative undo model, formulas engine, visual revision history, and more. Check out jobs at Airtable by going to airtable.com/sedaily.



The octopus: a sea creature known for its intelligence and flexibility. Octopus Deploy: a friendly deployment automation tool for deploying applications like .NET apps, Java apps and more. Ask any developer and they’ll tell you it’s never fun pushing code at 5pm on a Friday then crossing your fingers hoping for the best. That’s where Octopus Deploy comes into the picture. Octopus Deploy is a friendly deployment automation tool, taking over where your build/CI server ends. Use Octopus to promote releases on-prem or to the cloud. Octopus integrates with your existing build pipeline–TFS and VSTS, Bamboo, TeamCity, and Jenkins. It integrates with AWS, Azure, and on-prem environments. Reliably and repeatedly deploy your .NET and Java apps and more. If you can package it, Octopus can deploy it! It’s quick and easy to install. Go to Octopus.com to trial Octopus free for 45 days. That’s Octopus.com


There’s no need to reinvent the wheel when it comes to making your app “realtime.” PubNub makes it simple, enabling you to build immersive and interactive experiences on the web, on mobile phones, embedded into hardware, and any other device connected to the Internet. With powerful APIs, and a robust global infrastructure, you can stream geolocation data, send chat messages, turn on your sprinklers, or rock your baby’s crib when they start crying (PubNub literally powers IoT cribs). 70 SDKs for web, mobile, IoT, and more means you can start streaming data in realtime without a ton of compatibility headaches, and no need to build your own SDKs from scratch. Go to PubNub.com/sedaily to get started. They offer a generous sandbox tier that’s free forever (until your app takes off).