Data Engineering at Airbnb with Maxime Beauchemin

max-beauchemin

“One big transformation we’re seeing right now is the slow agonizing death of MapReduce.”

When a company gets big enough, there is so much data to be processed that an entire data engineering team becomes responsible for managing this data and making it available to other teams. Airbnb is one such company.

Max Beauchemin works on the data engineering team at Airbnb, where he creates infrastructure and tooling for managing data. In this episode of Software Engineering Daily, we talk about AIrflow, a workflow scheduler that assists in job processing. If you don’t know what a workflow is, or a job, we will explain that in this episode. Max and I also talk about Panoramix, a data slicing and visualization tool that helps data scientists and business analysts understand large volumes of data. Max will also be presenting at Strata + Hadoop World in San Jose. We’re partnering with O’Reilly to support this conference – if you want to go to Strata, you can save 20% off a ticket with our code PCSED.

Question

  • What does it mean to be a data engineer in January 2016?
  • What has changed the most about the Hadoop stack in the past decade?
  • What are some characteristics of Airbnb data engineering requirements?
  • How do you gather requirements around data and work with other departments within Airbnb?
  • Why is Druid so useful from the point of Panoramix?
  • What’s the future of data engineering?
  • What technologies will the big data ecosystem converge upon?

Links

Software Daily

Software Daily

 
Subscribe to Software Daily, a curated newsletter featuring the best and newest from the software engineering community.