Stemma: Understanding Big Data with Mark Grover
Podcast: Play in new window | Download
Subscribe: RSS
Amundsen was started at Lyft and is the leading open-source data catalog with the fastest-growing community and the most integrations. Amundsen enables you to search your entire organization by text search, see automated and curated metadata, share context with co workers, and learn from others by seeing most common queries on a table or frequently used data.
Powered by Amundsen, the company Stemma is a fully managed data catalog that bridges the gap between data producers and data consumers. Stemma adds features to Amundsen like showing meaningful data to individual users, adding metadata to data automatically, and documenting data on the fly. Stemma integrates with all the major data sources like Snowflake, Redshift, Google BigQuery, and Apache Airflow.
In this episode we talk to Mark Grover, Founder at Stemma. Mark co-created Amundsen and authored the book Hadoop Application Architectures. He was an engineer at Cloudera before joining Lyft as a Product Manager.
Sponsorship inquiries: sponsor@softwareengineeringdaily.com
Transcript
Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com to get 15% off the first three months of audio editing and transcription services with code: SED. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Sponsors
Go to replicated.com/sedaily to learn how Replicated can help you modernize your on-prem software delivery strategy. Replicated gives software vendors a container-based platform for easily deploying cloud native applications inside customers’ environments to provide greater security and control. There is a secure way that your customers can use your application without ever having to send data outside of their control. Go to replicated.com/sedaily to get a free 21 day trial of the Replicated platform.
Datadog is a cloud-scale monitoring platform that unifies metrics, logs, and traces from technologies like Istio, App Mesh, and Envoy. Plus, Datadog’s Service Map automatically plots out the dependencies in your microservices architecture for seamless, context-rich troubleshooting. With rich visualizations, algorithmic alerting, and more than 450 vendor-supported integrations, Datadog allows you to monitor your distributed applications in real time. Start a free 14-day trial today by visiting softwareengineeringdaily.com/datadog, and Datadog will send you a complimentary t-shirt.
If you have several PostgreSQL or MySQL databases running behind NAT, check out Teleport, an open source identity-aware access proxy. Teleport provides secure access to anything running behind NAT, such as SSH servers or Kubernetes clusters and – new in this release! – database instances, including AWS RDS. Teleport gives MySQL and Postgres users superpowers. Teleport ensures best security practices like role-based access, preventing data exfiltration, providing visibility and ensuring compliance. Download Teleport at softwareengineeringdaily.com/teleport
Pachyderm is an easy-to-use MLOps platform that empowers anyone to build scalable end-to-end machine learning workflows, regardless of whatever language or framework they are built on. Pachyderm provides Git-like data versioning and lineage to automatically track every data change and final output result. Head over to pachyderm.com/sedaily to get over $400 in free credits. But hurry because this offer only lasts for a limited time.
CockroachDB is a distributed SQL database that makes it simple to build resilient, scalable applications quickly. CockroachDB is Postgres compatible, giving the same familiar SQL interface database developers have used for years. CockroachDB is resilient, adaptable to any environment, and Kubernetes-native. Host it on prem, run it in a hybrid cloud, and even deploy it across multiple clouds. Sign up for your forever-free database and get a free t-shirt at cockroachlabs.com/sedaily.