Datahub: Open Source Data Lake with Pardhu Gunnam and Mars Lan

As the volume and scope of data collected by an organization grow, tasks such as data discovery and data management grow in complexity. Simply put, the more data there is, the harder it is for users such as data analysts to find what they’re looking for. A metadata hub helps manage Big Data by providing metadata search and discovery tools, and a centralized hub which presents a holistic view of the data ecosystem. DataHub is Linkedin’s open-sourced metadata search and discovery tool. It is Linkedin’s second generation of metadata hubs after WhereHows. 

Pardhu Gunnam and Mars Lan join us today from Metaphor, a company they co-founded to build out the DataHub ecosystem. Pardhu and Mars, and the other co-founders of Metaphor, were part of the team at Linkedin that built the DataHub project. They join the show today to talk about how DataHub democratizes data access for an organization, why the new DataHub architecture was critical to Linkedin’s growth, and what we can expect to see from the DataHub project moving forwards.

Sponsorship inquiries: sponsor@softwareengineeringdaily.com

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.


Sponsors

Epsagon enables teams to instantly simplify, visualize, and understand what’s happening within their complex microservice architectures. Increase development efficiency and reduce application downtime with Epsagon. Try out Epsagon and connect your first trace today to receive one of their awesome t-shirts. Check it out at epsagon.com/SEDaily

At Accenture, you can build your specializations through various learning paths, trainings, and certifications we offer. Accenture is hiring experienced AWS developers and architects. Accenture offers opportunities for you to channel your AWS experience into breakthrough innovations. At Accenture, you can benefit from limitless opportunities to expand your knowledge while gaining hands-on AWS experience. Search and apply at softwareengineeringdaily.com/accenture.

If you have several PostgreSQL or MySQL databases running behind NAT, check out Teleport, an open source identity-aware access proxy. Teleport provides secure access to anything running behind NAT, such as SSH servers or Kubernetes clusters and – new in this release! – database instances, including AWS RDS. Teleport gives MySQL and Postgres users superpowers. Teleport ensures best security practices like role-based access, preventing data exfiltration, providing visibility and ensuring compliance. Download Teleport at softwareengineeringdaily.com/teleport 

From their recent report on serverless adoption and trends, Datadog found half of their customer base using EC2s have now adopted AWS Lambda. You can easily monitor all your serverless functions in one place and generate serverless metrics straight from Datadog. Check it out yourself by signing up for a free 14-day trial and get a free t-shirt at softwareengineeringdaily.com/datadog

Software Daily

Software Daily

 
Subscribe to Software Daily, a curated newsletter featuring the best and newest from the software engineering community.