Scaling Lyft with Matt Klein
Matt Klein has worked for three rapidly growing Internet companies. At AWS, he worked on EC2, the compute-as-a-service product that powers a large percentage of the Internet. At Twitter, he helped scale the infrastructure in the chaotic days before Twitter’s IPO. Today he works at Lyft, building systems to allow for ride sharing infrastructure to work more safely and reliably.
Hypergrowth Internet companies are faced with quickly growing demands on their software. The demands on the software expose problems with the core infrastructure. Simultaneously, the company tries to ramp up its hiring process. More engineers get hired, and the institutional knowledge within the company starts to weaken. Documentation gets out of date. Senior engineers burn out and leave the company.
When a company starts growing quickly, communications can break down. A hypergrowth company can suffer from a lack of “human scalability”. Matt Klein has observed these challenges at AWS, Twitter, and Lyft. In his article “The Human Scalability of ‘DevOps’”, he explains why these problems manifest and what can be done to alleviate them.
In a previous show, Matt discussed the engineering challenges at Lyft that led him to create Envoy, a service proxy. This episode covers some broad technical topics–DevOps, site reliability engineering, platform engineering–but the episode is mostly about how a hypergrowth company can manage culture, hiring, and engineering organization.
Matt is a very fun guest to have because he questions some of the strange practices that have been widely adopted by successful companies. Internet companies are a very new phenomenon, and the management tactics that they have adopted are not well proven–so it is great to have someone like Matt provide a fresh perspective on ways that companies can scale their technology and their organization more effectively.
- The human scalability of “DevOps” – Matt Klein – Medium
- How to scale DevOps: Recipes for larger organizations
- A Beginner’s Guide to Scaling DevOps – DZone DevOps
- DevOps vs. SRE: What’s the Difference Between Them, and Which One Are You? | OverOps Blog
- How do I do DevOps at Scale? – Plutora
- Five Top Tips for DevOps At Scale – DevOps.com
- Scaling DevOps at Pearson – DevOps.com
- In praise of fungible developers | Echo One
Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
OpenShift is a Kubernetes platform from Red Hat. OpenShift takes the Kubernetes container orchestration system and adds features that let you build software more quickly. OpenShift includes service discovery, CI/CD, built-in monitoring and health management, and scalability. With OpenShift, you avoid getting locked into any particular cloud provider. Check out OpenShift from RedHat, by going to softwareengineeringdaily.com/redhat.
Datadog is a cloud-scale monitoring platform for infrastructure and applications. And with Datadog’s new Live Container view, you can see every container’s health, resource consumption, and running processes in real time. See for yourself by starting a free trial and get a free Datadog T-shirt! softwareengineeringdaily.com/
Manifold makes your life easier by providing a single workflow to organize your services, connect your integrations, and share with your team. While Manifold is completely free to use, if you head over to manifold.co/sedaily you’ll get a coupon code for $10 which you can use to try out any service on the Manifold marketplace.
Gremlin provides resilience as a service, using chaos engineering techniques pioneered at Netflix and Amazon. Prepare your team for disaster by proactively testing failure scenarios. Check out Gremlin and get a free demo by going to gremlin.com/sedaily.