Data Warehouse ETL with Matthew Scullion
A data warehouse provides low latency access to large volumes of data.
A data warehouse is a crucial piece of infrastructure for a large company, because it can be used to answer complex questions involving a large number of data points. But a data warehouse usually cannot hold all of a company’s data at any given time. Users need to move a subset of the data into the data warehouse by reading large files from a data lake on disk and putting that data into the data warehouse.
The process of moving data from one place into another is broken down into three sequential steps, often called “ETL” (extract, transform, load) or “ELT” (extract, load, transform). In ETL, the data is extracted from a source such as a data lake, transformed into a schema that is customized for the data warehouse application, and then loaded into the data warehouse. In ELT, the last two steps are reversed, because modern systems can often leave the necessary schema transformation until after the data has been loaded into the data warehouse.
Matthew Scullion is the CEO of Matillion, a company that specializes in building tools for data transformations. Matthew joins the show to talk about the problem of data transformation, and how that problem has evolved over the nine years since he started Matillion.
If you enjoy the show, you can find all of our past episodes about data infrastructure by going to SoftwareDaily.com and searching for the technologies or companies mentioned. And if there is a subject that you want to hear covered, feel free to leave a comment on the episode, or send us a tweet @software_daily.
Sponsorship inquiries: firstname.lastname@example.org
Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.