Presto with Justin Borgman

A data platform contains all of the data that a company has accumulated over the years. Across a data platform, there is a multitude of data sources: databases, a data lake, data warehouses, a distributed queue like Kafka, and external data sources like Salesforce and Zendesk.

A user of the data platform often has a question that requires multiple data sources to answer. How does this user join two data sources from a data lake? How does this user join data across a transactional database and a data lake? How does the user join data from two different data warehouse technologies? 

Presto is an open source tool originally developed at Facebook. Presto allows a user to query a data platform with a SQL statement. That query gets parsed and executed across the data platform to read from any heterogeneous data source. For some use cases, Presto is replacing the technology Hadoop MapReduce-based technology Hive. For other use cases, Presto is solving a problem in a completely novel way.

Justin Borgman joins the show to discuss the motivation for Presto, the problems it solves, and the architecture of Presto. He also talks about the company he started, Starburst Data, which sells and supports technologies built around Presto.

If you enjoy the show, you can find all of our past episodes about data infrastructure by going to SoftwareDaily.com and searching for the technologies or companies mentioned. And if there is a subject that you want to hear covered, feel free to leave a comment on the episode, or send us a tweet @software_daily.

Sponsorship inquiries: sponsor@softwareengineeringdaily.com

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.


Sponsors

MongoDB is the most popular document-based database built for modern application developers and the cloud era. Try MongoDB today with Atlas, the global cloud database service that runs on AWS, Azure, and Google Cloud. Configure, deploy, and connect to your database in just a few minutes. Check it out at mongodb.com/atlas.

With Triplebyte, you do one online interview, and then you get to go straight to final interviews at hundreds of companies (from tech giants like Dropbox to exciting startups). It’s like the Common App for software engineers. No resume needed. Apply now at triplebyte.com/sedaily. If you take a job through Triplebyte, you’ll get a $1000 signing bonus.

Heroku’s fully managed Postgres, Redis, and Apache Kafka data services help you get started faster, and be more productive, which means you can focus on building data-driven apps, not data infrastructure. Visit softwareengineeringdaily.com/herokudata to learn more about Heroku’s managed data services.

VictorOps is a collaborative incident response tool. VictorOps brings your monitoring data and your collaboration tools into one place–so that you can fix issues more quickly, and reduce the pain of on-call. If you want to hear about how VictorOps works, you can listen to our episode with Chris Riley. Learn more about it as well as get a free t-shirt when you check it out at victorops.com/sedaily.

Software Weekly

Software Weekly

Subscribe to Software Weekly, a curated weekly newsletter featuring the best and newest from the software engineering community.