Streaming with Holden Karau
Distributed stream processing allows developers to build applications on top of large sets of data that are being rapidly created. Stream processing is often described as an alternative to batch processing. In batch processing, a single large computation is performed over a large, static data set. In stream processing, a computation is performed repeatedly and continuously over a data set that is being appended to.
A stream is often stored in a distributed queue such as Kafka, Kinesis, Pulsar, or Google PubSub. A stream is often processed with a stream processing tool such as Spark, Flink, Storm, or Google Cloud Dataflow.
Holden Karau is an engineer who works on open source projects at Google. She returns to the show to describe the state of stream processing and discuss modern best practices.
Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.