Faust: Streaming at Robinhood with Ask Solem

Robinhood is a platform for buying and selling stocks, cryptocurrencies, and other assets. Since its founding in 2013, Robinhood has grown to have more than 5 million user accounts, which is even more than the popular online broker E-Trade. With the surge in user growth and transaction volume, the demands on the software infrastructure have increased significantly.

When a user buys a stock on Robinhood, that transaction gets written to Kafka and Postgres. Multiple services get notified of the new entry on the Kafka topic, and those services process that new event using Kafka Streams. Kafka Streams are a way of reading streams of data out of Kafka with exactly-once semantics. Developers at Robinhood use a variety of languages to build services on top of these Kafka streams–including Python.

Commonly used systems for building stream processing tasks on top of a Kafka topic include Apache Flink and Apache Spark. Spark and Flink let you work with large data sets while maintaining high speed and fault-tolerance. These tools are written in Java. If you want to write a Python program that interfaces with Apache Spark, you have to pay an expensive serialization/deserialization cost as you move that object between Python and Spark.

Ask Solem is an engineer with Robinhood, and the author of Faust, a stream processing library that ports the ideas of Kafka Streams to Python. Faust provides stream processing and event processing in a manner that is similar to Kafka Streams, Apache Spark, and Apache Flink. He is also the author of the popular Celery asynchronous task queue. Ask joins the show to provide his perspective on large scale, distributed stream processing, and why he created Faust.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.


Sponsors

Citus is worry-free Postgres that is built to scale out. Made for SaaS and enterprises, Citus is an extension to Postgres that transforms Postgres into a distributed database. Whether you need to scale out a multi-tenant app—or are building real-time analytics dashboards that require sub-second responses—Citus makes it simple to shard Postgres. Go to citusdata.com/sedaily to learn more about how Citus can scale your Postgres database.

DoiT International helps startups optimize the costs of their workloads across Google Cloud and AWS, so that they can spend more time building new software–and less time reducing cost. DoiT International helps clients optimize their costs–and if your cloud bill is over $10,000 per month, you can get a free cost-optimization assessment by going to doit-intl.com/sedaily.

Accenture is hiring software engineers and architects skilled in modern cloud native tech. If you’re looking for a job, check out open opportunities at  softwareengineeringdaily.com/accenture. Working with over 90% of the Fortune 100 companies, Accenture is creating innovative, cutting-edge applications for the cloud, and they are the number one integrator for Amazon Web Services, Microsoft Azure, Google Cloud Platform, and more.

Vettery is an online hiring marketplace that connects highly qualified job-seekers with inspiring companies. Once you’re accepted to Vettery, companies reach out directly to you. Sign up on vettery.com/sedaily and get a $500 bonus if you accept a job through Vettery.