Apache Kafka with Guozhang Wang
Podcast: Play in new window | Download
Subscribe: RSS
Apache Kafka is a publish-subscribe messaging system rethought as a distributed commit log.
Kafka serves as the central repository for data streams in a distributed system.
Guozhang Wang is an engineer at Confluent, which offers a stream data platform built using Kafka.
Questions include:
- What is a central repository for data streams?
- How does Kafka improve transportation between systems?
- How does Kafka allow for richer analytical processing?
- What are the roles of topics, producers, consumers, and brokers?
- Do Spark, Storm, and Samza all use Kafka the same way?
- How does Kafka combine queueing and pub-sub into a single abstraction: the consumer group?
Links:
- A Practical Guide to Kafka, by Jay Kreps
- Kafka Documentation
- Kafka Podcast on Software Engineering Radio
- Kafka Podcast on All Things Hadoop includes notes and diagrams)
- Kafka Podcast on O’Reilly Data