Data Engineering Podcast with Tobias Macey

Cloud computing lowered the cost and improved accessibility to tools for storing large volumes of data. In the early 2000s, Hadoop caused a revolution in large scale batch processing. Since then, companies have been building ways to store and access their data faster and more efficiently.

At the same time, the sheer volume of data has increased and machine learning has given rise to methods of extracting signal from seemingly inconsequential data points. This confluence of factors gave rise to the role of the data engineer. A data engineer defines the data pipeline and supports data scientists and machine learning engineers.

Tobias Macey hosts the “Data Engineering Podcast,” where he covers the fast moving world of data engineering–including databases, cloud providers, and open source tools. Tobias and I covered a range of topics in the data engineering space and also spent significant time discussing the world of software engineering podcasting.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Sponsors


Today’s sponsor is Datadog, a monitoring and analytics platform for cloud-scale infrastructure and applications. Datadog integrates seamlessly with more than 200 technologies, so you can track every layer of your complex microservice architecture, all in one place. Distributed tracing and APM provide end-to-end visibility into requests wherever they go, across hosts, containers, and service boundaries. With rich dashboards, algorithmic alerts, and collaboration tools, Datadog provides your team with the tools they need to quickly troubleshoot and optimize modern applications. See for yourself – start a 14-day free trial today and Datadog will send you a free T-shirt! softwareengineeringdaily.com/datadog


Azure Container Service simplifies the deployment, management and operations of Kubernetes. You can continue to work with the tools you already know, such as Helm, and move applications to any Kubernetes deployment. Integrate with your choice of container registry, including Azure Container Registry. Also, quickly and efficiently scale to maximize your resource utilization without having to take your applications offline. Isolate your application from infrastructure failures and transparently scale the underlying infrastructure to meet growing demands—all while increasing the security, reliability, and availability of critical business workloads with Azure. Check out the Azure Container Service at aka.ms/sedaily.


The octopus: a sea creature known for its intelligence and flexibility. Octopus Deploy: a friendly deployment automation tool for deploying applications like .NET apps, Java apps and more. Ask any developer and they’ll tell you it’s never fun pushing code at 5pm on a Friday then crossing your fingers hoping for the best. That’s where Octopus Deploy comes into the picture. Octopus Deploy is a friendly deployment automation tool, taking over where your build/CI server ends. Use Octopus to promote releases on-prem or to the cloud. Octopus integrates with your existing build pipeline–TFS and VSTS, Bamboo, TeamCity, and Jenkins. It integrates with AWS, Azure, and on-prem environments. Reliably and repeatedly deploy your .NET and Java apps and more. If you can package it, Octopus can deploy it! It’s quick and easy to install. Go to Octopus.com to trial Octopus free for 45 days. That’s Octopus.com


There’s a new open source project called Dremio that is designed to simplify analytics. It’s also designed to handle some of the hard work, like scaling performance of analytical jobs. Dremio is the team behind Apache Arrow, a new standard for in-memory columnar data analytics. Arrow has been adopted across dozens of projects – like Pandas – to improve the performance of analytical workloads on CPUs and GPUs. It’s free and open source, designed for everyone, from your laptop, to clusters of over 1,000 nodes. At dremio.com/sedaily you can find all the necessary resources to get started with Dremio for free. If you like it, be sure to tweet @dremiohq and let them know you heard about it from Software Engineering Daily. Thanks again to Dremio, and check out dremio.com/sedaily to learn more.