Data Mechanics: Data Engineering with Jean-Yves Stephan

Apache Spark is a popular open source analytics engine for large-scale data processing. Applications can be written in Java, Scala, Python, R, and SQL. These applications have flexible options to run on like Kubernetes or in the cloud. 

The company Data Mechanics is a cloud-native Spark platform for data engineers. It runs continuously optimized Apache Spark workloads on a managed Kubernetes cluster within the user’s cloud account. They boast a 50%-75% cost reduction from cloud providers by dynamically scaling applications based on load and automatically tuning app configurations based on the historical Spark pipeline runs. Their Kubernetes clusters are deployed within user accounts so user data never leaves the environment and they handle the cluster management. 

In this episode we talk to Jean-Yves Stephan, Co-Founder and CEO at Data Mechanics. Jean-Yves previously worked as a Software Engineer then a Tech Lead Manager at Databricks. We discuss big data engineering in Spark and the unique advantages of using Data Mechanics to make Spark development easier and more cost effective.

Sponsorship inquiries: sponsor@softwareengineeringdaily.com

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com to get 15% off the first three months of audio editing and transcription services with code: SED. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.


Sponsors

Pachyderm is an easy-to-use MLOps platform that empowers anyone to build scalable end-to-end machine learning workflows, regardless of whatever language or framework they are built on. Pachyderm provides Git-like data versioning and lineage to automatically track every data change and final output result. Head over to pachyderm.com/sedaily to get over $400 in free credits. But hurry because this offer only lasts for a limited time.

Oracle wants to help you land those big customers, so they’re offering preferred pricing on enterprise cloud for startups. Free cloud credits and 70% off their cloud services, and with multi-cloud support and no vendor lock-in, you can build it out any way you want. Oracle for Startups doesn’t want you wheezing on the side of the road. They want you to have enough power to scale and land your dream customer. Visit oracle.com/go/sedaily.

From their recent report on serverless adoption and trends, Datadog found half of their customer base using EC2s have now adopted AWS Lambda. You can easily monitor all your serverless functions in one place and generate serverless metrics straight from Datadog. Check it out yourself by signing up for a free 14-day trial and get a free t-shirt at softwareengineeringdaily.com/datadog

If you have several PostgreSQL or MySQL databases running behind NAT, check out Teleport, an open source identity-aware access proxy. Teleport provides secure access to anything running behind NAT, such as SSH servers or Kubernetes clusters and – new in this release! – database instances, including AWS RDS. Teleport gives MySQL and Postgres users superpowers. Teleport ensures best security practices like role-based access, preventing data exfiltration, providing visibility and ensuring compliance. Download Teleport at softwareengineeringdaily.com/teleport 

Software Daily

Software Daily

 
Subscribe to Software Daily, a curated newsletter featuring the best and newest from the software engineering community.