Meltano: Data Engineering Lifecycle with Danielle Morrill

Data engineering allows a company to take advantage of the large quantities of data that the company has generated. In many companies, new data has been produced rapidly for many years, but the company has not been able to take full advantage of it. 

Creating large data sets does not provide immediate value for a company. A company needs to perform data engineering and data science to take full advantage of it.

When data gets generated, it is stored in a database, data lake, or API backend like Google Analytics. In order to manipulate that data, it is often pulled into a data warehouse. A data warehouse provides fast access time to large quantities of data.

Pulling data from a source like a database or data lake into a data warehouse requires a process known as extract and load. Once the data is in the data warehouse, it may also undergo a transform, which enriches the data or puts it in a format that is easier to make use of. Once data is in a data warehouse, it can be used to build models, interactive dashboards, and Jupyter Notebooks.

The data engineering lifecycle has many different components, which is why data engineering can often be intimidating to a company that is trying to make use of their data. Meltano is a project with the goal of providing a system of conventions for managing the data engineering lifecycle. Meltano was started by GitLab, and the Meltano project has some strategic similarities to GitLab.

Danielle Morill is the general manager of Meltano at GitLab. She joins the show to discuss the world of data engineering, and the architecture of Meltano. We touch on the different components of a data engineering pipeline, and the most acute pain points for data engineers.

 ANNOUNCEMENTS

  • FindCollabs is a place to find collaborators and build projects. FindCollabs is the company I am building, and we are having an online hackathon with $2500 in prizes. If you are working on a project, or you are looking for other programmers to build a project or start a company with, check out FindCollabs. I’ve been interviewing people from some of these projects on the FindCollabs podcast, so if you want to learn more about the community you can hear that podcast.
  • New Software Daily app for iOS. It includes all 1000 of our old episodes, as well as related links, greatest hits, and topics. You can comment on episodes and have discussions with other members of the community. And you can become a paid subscriber for ad free episodes at softwareengineeringdaily.com/subscribe. Altalogy is the company who has been developing much of the software for the newest app, and if you are looking for a company to help you with your mobile and web development, I recommend checking them out.
  • Upcoming conferences I’m attending: Datadog Dash July 16th and 17th in NYC, Open Core Summit September 19th and 20th in San Francisco.
  • We are hiring two interns for software engineering and business development! If you are interested in either position, send an email with your resume to jeff@softwareengineeringdaily.com with “Internship” in the subject line.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.


Software Daily

Software Daily

 
Subscribe to Software Daily, a curated newsletter featuring the best and newest from the software engineering community.