Prefect Dataflow Scheduler with Jeremiah Lowin

A data workflow scheduler is a tool used for connecting multiple systems together in order to build pipelines for processing data. A data pipeline might include a Hadoop task for ETL, a Spark task for stream processing, and a TensorFlow task to train a machine learning model. 

The workflow scheduler manages the tasks in that data pipeline and the logical flow between them. Airflow is a popular data workflow scheduler that was originally created at Airbnb. Since then, the project has been adopted by numerous companies that need workflow orchestration for their data pipelines. Jeremiah Lowin was a core committer to Airflow for several years before he identified several features of Airflow that he wanted to change.

Prefect is a dataflow scheduler that was born out of Jeremiah’s experience working with Airflow. Prefect’s features include data sharing between tasks, task parameterization, and a different API than Airflow. Jeremiah joins the show to discuss Prefect, and how his experience with Airflow led to his current work in dataflow scheduling.

Sponsorship inquiries: sponsor@softwareengineeringdaily.com

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.