Lyft’s Data Platform with Li Gao
FindCollabs Hackathon #1 has ended! Congrats to ARhythm, Kitspace, and Rivaly for winning 1st, 2nd, and 3rd place ($4,000, $1000, and a set of SE Daily hoodies, respectively). The most valuable feedback award and the most helpful community member award both go to Vynce Montgomery, who will receive both the SE Daily Towel and the SE Daily Old School Bucket Hat
Lyft generates petabytes of data. Driver and rider behavior, pricing information, the movement of cars through space; all of this data is received by Lyft’s backend services, buffered into Kafka queues, and processed by various stream processing systems.
Lyft moves the high volumes of data into a data lake for different users throughout the company to use offline. Machine learning jobs, batch jobs, streaming jobs and materialized databases can be created on top of that data lake. Druid and Superset are used for operational analytics and dashboarding.
Li Gao is a data engineer at Lyft. He joins the show to explore the different aspects of Lyft’s data platform. We also talk about the tradeoffs of streaming frameworks, and how to manage machine learning infrastructure. This episode is a great companion to our show about Uber’s data platform, and illustrates some fundamental differences in how the two ridesharing companies operate.
Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Netlify is a modern way to build and manage fast, modern websites that run without the need for addressable web servers. Netlify is “serverless.” Automatic forms, identity management, and tools to manage and transform large images and media. Learn more about Netlify’s powerful platform at netlify.com/sedaily.
Digital Ocean is the easiest cloud platform to run and scale your application. Try it out today and get a free $100 credit–go to do.co/sedaily. Digital Ocean is a complete cloud platform to help developers and teams save time when running and scaling their applications.
Datadog unites metrics, traces, and logs in one platform so you can get full visibility into your infrastructure and applications. Check out new features like Trace Search & Analytics for rapid insights into high-cardinality data, and Watchdog, an auto-detection engine that alerts you to performance anomalies across your applications. Datadog makes it easy for teams to monitor every layer of their stack in one place, but don’t take our word for it—start a free trial today & Datadog will send you a T-shirt! softwareengineeringdaily.com/datadog
GoCD is a continuous delivery tool created by ThoughtWorks. It’s great to see the continued progress on GoCD with the new Kubernetes integrations–and you can check it out for yourself at gocd.org/sedaily.