Apache Beam with Frances Perry
Podcast: Play in new window | Download
Subscribe: RSS

Unbounded data streams create difficult challenges for our application architectures. The data never stops coming, and we are forced to assume that we will never know if or when we have seen all of our data. Some streaming systems give us the tools to deal partially with unbounded data streams, but we have to complement those streaming systems with batch processing, in a technique known as the Lambda Architecture.
Apache Beam is a unified model for defining and executing data processing workflows, and Frances Perry joins the show to explain how Beam provides a way for us to model our data processing, agnostic of whether we choose to run those workflows on Spark, Flink, or Google’s Dataflow.
Links
- Apache Beam
- Streaming 101
- Streaming 102
- The Dataflow Model
- Google Cloud Dataflow
- Fundamentals of Stream Processing with Beam
- Mobile Gaming Example
- Dataflow: Beam and Spark Comparison