Druid Analytical Database with Fangjin Yang

Modern applications produce large numbers of events. These events can be users clicking, IoT sensors accumulating data, or log messages.

The cost of cloud storage and compute continues to drop, so engineers can afford to build applications around these high volumes of events, and a variety of tools have been developed to process them. Apache Kafka is widely used to store and queue these streams of data, and Apache Spark and Apache Flink are stream processing systems that are used to perform general purpose computations across this event stream data.

Kafka, Spark, and Flink are great general purpose tools, but there is also room for a more narrow set of distributed systems tools to support high volume event data. Apache Druid is an open source database built for high performance, read only analytic workloads. Druid has a useful combination of features for event data workloads, including a column-oriented storage system, automatic search indexing, and a horizontally scalable architecture.

Druid’s feature set allows for new types of analytics applications to be built on top of it, including search applications, dashboards, and ad-hoc analytics. Fangjin Yang is a core contributor to Druid and the CEO of Imply.io, a company that makes a storage, querying, and visualization tool build on top of Druid. He joins the show to talk about the architecture of Druid and his company Imply.

 

Show Notes

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.


Sponsors

Azure Container Service simplifies the deployment, management and operations of Kubernetes. Check out the Azure Container Service at aka.ms/sedaily.

DoiT International helps startups optimize the costs of their workloads across Google Cloud and AWS, so that they can spend more time building new software–and less time reducing cost. DoiT International helps clients optimize their costs–and if your cloud bill is over $10,000 per month, you can get a free cost-optimization assessment by going to doit-intl.com/sedaily.

Transifex is a SaaS-based localization and translation platform that easily integrates with your agile development process. Your software, websites, games, apps, video subtitles, and more can all be translated with Transifex. Use Transifex with in-house translation teams, language service providers, or even crowdsource your translations. If you’re a developer who is ready to reach a global audience, check out Transifex by visiting transifex.com/sedaily and sign up for a free 15-day trial.