Materialize: Streaming SQL on Timely Data with Arjun Narayan and Frank McSherry

Distributed stream processing frameworks are used to rapidly ingest and aggregate large volumes of incoming data. These frameworks often require the application developer to write imperative logic describing how that data should be processed. 

For example, a high volume of clickstream data that is getting buffered to Kafka needs to have a stream processing system evaluate that data to prepare it for a data warehouse, Spark, or some other queryable environment. In practice, many developers simply want to have the high volume of data become queryable in the fewest number of steps possible.

Materialize is a streaming SQL materialized view engine that provides materialized views over streaming data. The materialized views are incrementally updated over time and reconciled with new data that may have come in out of order.

Arjun Narayan and Frank McSherry are the co-founders of Materialize, a company whose technology is based on the Naiad paper, which was written at Microsoft Research. Arjun and Frank join the show to talk about modern streaming systems and their strategy for taking an academic paper and productizing it.

Sponsorship inquiries: sponsor@softwareengineeringdaily.com

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.


Sponsors

MongoDB is the most popular document-based database built for modern application developers and the cloud era. Try MongoDB today with Atlas, the global cloud database service that runs on AWS, Azure, and Google Cloud. Configure, deploy, and connect to your database in just a few minutes. Check it out at mongodb.com/atlas.

Today’s sponsor is Datadog, a monitoring and analytics platform for cloud-scale infrastructure and applications. Datadog provides seamless integrations with more than 200 technologies, including AWS, Postgres, MySQL, and Docker, so you can start collecting and visualizing performance metrics quickly. See for yourself – start a 14-day free trial today and Datadog will send you a free T-shirt! – softwareengineeringdaily.com/datadog

DigitalOcean makes infrastructure simple. And for an application that needs to scale, DigitalOcean has CPU-Optimized Droplets, Memory-Optimized Droplets, Managed Databases, Managed Kubernetes, and much more. Visit do.co/sedaily and receive $100 in credit over 60 days.

Logi’s embedded analytics platform makes it possible to create, update and brand your analytics so they seamlessly integrate within your application. Visit Logianalytics.com/sedaily to see what’s possible with Logi, today.

Software Weekly

Software Weekly

Subscribe to Software Weekly, a curated weekly newsletter featuring the best and newest from the software engineering community.