Meltano: Data Engineering Lifecycle with Danielle Morrill

Data engineering allows a company to take advantage of the large quantities of data that the company has generated. In many companies, new data has been produced rapidly for many years, but the company has not been able to take full advantage of it. 

Creating large data sets does not provide immediate value for a company. A company needs to perform data engineering and data science to take full advantage of it.

When data gets generated, it is stored in a database, data lake, or API backend like Google Analytics. In order to manipulate that data, it is often pulled into a data warehouse. A data warehouse provides fast access time to large quantities of data.

Pulling data from a source like a database or data lake into a data warehouse requires a process known as extract and load. Once the data is in the data warehouse, it may also undergo a transform, which enriches the data or puts it in a format that is easier to make use of. Once data is in a data warehouse, it can be used to build models, interactive dashboards, and Jupyter Notebooks.

The data engineering lifecycle has many different components, which is why data engineering can often be intimidating to a company that is trying to make use of their data. Meltano is a project with the goal of providing a system of conventions for managing the data engineering lifecycle. Meltano was started by GitLab, and the Meltano project has some strategic similarities to GitLab.

Danielle Morill is the general manager of Meltano at GitLab. She joins the show to discuss the world of data engineering, and the architecture of Meltano. We touch on the different components of a data engineering pipeline, and the most acute pain points for data engineers.

 ANNOUNCEMENTS

  • FindCollabs is a place to find collaborators and build projects. FindCollabs is the company I am building, and we are having an online hackathon with $2500 in prizes. If you are working on a project, or you are looking for other programmers to build a project or start a company with, check out FindCollabs. I’ve been interviewing people from some of these projects on the FindCollabs podcast, so if you want to learn more about the community you can hear that podcast.
  • New Software Daily app for iOS. It includes all 1000 of our old episodes, as well as related links, greatest hits, and topics. You can comment on episodes and have discussions with other members of the community. And you can become a paid subscriber for ad free episodes at softwareengineeringdaily.com/subscribe. Altalogy is the company who has been developing much of the software for the newest app, and if you are looking for a company to help you with your mobile and web development, I recommend checking them out.
  • Upcoming conferences I’m attending: Datadog Dash July 16th and 17th in NYC, Open Core Summit September 19th and 20th in San Francisco.
  • We are hiring two interns for software engineering and business development! If you are interested in either position, send an email with your resume to jeff@softwareengineeringdaily.com with “Internship” in the subject line.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.


Sponsors

Cruise is a San Francisco-based company building a fully electric self-driving car service. Cruise is a place where you can build on your existing skills while developing new skills and experiences that are pioneering the future of industry. There are opportunities for backend engineers, frontend developers, machine learning programmers, and many more positions. At Cruise you will be surrounded by talented, driven engineers-all while helping make cities safer and cleaner. Apply to work at Cruise, by going to getcruise.com/careers.

Datadog unites metrics, traces, and logs in one platform so you can get full visibility into your infrastructure and applications. Check out new features like Trace Search & Analytics for rapid insights into high-cardinality data, and Watchdog, an auto-detection engine that alerts you to performance anomalies across your applications. Datadog makes it easy for teams to monitor every layer of their stack in one place, but don’t take our word for it—start a free trial today & Datadog will send you a T-shirt! softwareengineeringdaily.com/datadog

With MongoDB Atlas, you can take advantage of MongoDB’s flexible document data model as a fully automated cloud service. MongoDB Atlas handles all the costly database operations and admin tasks that you’d rather not spend time on, like security, high availability, data recovery, monitoring, and elastic scaling.Try MongoDB Atlas for free today! Visit mongdb.com/se to learn more.

The Open Core Summit is a conference for commercial open source software. If you are building a business around open source software, check out the Open Core Summit, September 19th and 20th at the Palace of Fine Arts in San Francisco. Go to OpenCoreSummit.com to register.

Software Weekly

Software Weekly

Subscribe to Software Weekly, a curated weekly newsletter featuring the best and newest from the software engineering community.