Apache Kafka with Guozhang Wang

Apache Kafka is a publish-subscribe messaging system rethought as a distributed commit log.

Kafka serves as the central repository for data streams in a distributed system.

Guozhang Wang is an engineer at Confluent, which offers a stream data platform built using Kafka.

Questions include:

  • What is a central repository for data streams?
  • How does Kafka improve transportation between systems?
  • How does Kafka allow for richer analytical processing?
  • What are the roles of topics, producers, consumers, and brokers?
  • Do Spark, Storm, and Samza all use Kafka the same way?
  • How does Kafka combine queueing and pub-sub into a single abstraction: the consumer group?