Druid Analytical Database with Fangjin Yang

Podcast Friday, September 14 2018

Subscribe: RSS

Modern applications produce large numbers of events. These events can be users clicking, IoT sensors accumulating data, or log messages.

The cost of cloud storage and compute continues to drop, so engineers can afford to build applications around these high volumes of events, and a variety of tools have been developed to process them. Apache Kafka is widely used to store and queue these streams of data, and Apache Spark and Apache Flink are stream processing systems that are used to perform general purpose computations across this event stream data.

Kafka, Spark, and Flink are great general purpose tools, but there is also room for a more narrow set of distributed systems tools to support high volume event data. Apache Druid is an open source database built for high performance, read only analytic workloads. Druid has a useful combination of features for event data workloads, including a column-oriented storage system, automatic search indexing, and a horizontally scalable architecture.

Druid’s feature set allows for new types of analytics applications to be built on top of it, including search applications, dashboards, and ad-hoc analytics. Fangjin Yang is a core contributor to Druid and the CEO of Imply.io, a company that makes a storage, querying, and visualization tool build on top of Druid. He joins the show to talk about the architecture of Druid and his company Imply.

Show Notes

SE Daily

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.