Presto with Justin Borgman

Podcast Friday, February 7 2020

Subscribe: RSS

A data platform contains all of the data that a company has accumulated over the years. Across a data platform, there is a multitude of data sources: databases, a data lake, data warehouses, a distributed queue like Kafka, and external data sources like Salesforce and Zendesk.

A user of the data platform often has a question that requires multiple data sources to answer. How does this user join two data sources from a data lake? How does this user join data across a transactional database and a data lake? How does the user join data from two different data warehouse technologies?

Presto is an open source tool originally developed at Facebook. Presto allows a user to query a data platform with a SQL statement. That query gets parsed and executed across the data platform to read from any heterogeneous data source. For some use cases, Presto is replacing the technology Hadoop MapReduce-based technology Hive. For other use cases, Presto is solving a problem in a completely novel way.

Justin Borgman joins the show to discuss the motivation for Presto, the problems it solves, and the architecture of Presto. He also talks about the company he started, Starburst Data, which sells and supports technologies built around Presto.

If you enjoy the show, you can find all of our past episodes about data infrastructure by going to SoftwareDaily.com and searching for the technologies or companies mentioned. And if there is a subject that you want to hear covered, feel free to leave a comment on the episode, or send us a tweet @software_daily.

Sponsorship inquiries: sponsor@softwareengineeringdaily.com

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.