Cloud Dataflow with Eric Anderson

Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have iterated on them.

 

Dataflow and Apache Beam are projects that present a unified batch and stream processing system. A previous episode with Frances Perry discussed how they work in detail. In today’s episode, Eric Anderson discusses Cloud Dataflow, a service from Google that lets users manage their data processing pipelines without having to spin up individual servers to run them on.

 

Cloud Dataflow–like the “serverless” movement we have done several shows on–represents a growing shift towards cloud providers offering services that abstract away the operational challenges of managing compute nodes. If you have suggestions for topics in this area, please do send me an email.

Sponsors

ThoughtWorks_Snap_black-400x400 SnapCI is a continuous integration tool built by Thoughtworks. Go to snap.ci/softwareengineeringdaily to check it out.
alooma Alooma is your data pipeline as a service. Alooma is a fully managed tool for pulling from different data sources–MySQL, Postgres, elasticsearch, Salesforce, and many others. Go to alooma.com/sedaily for more information.

Comments