Machine Learning Joins with Arun Kumar


FindCollabs $5000 Hackathon Ends Saturday April 15th, 2019

New version of Software Daily, our app and ad-free subscription service

Software Daily is looking for help with Android engineering, QA, machine learning, and more

Data sets can be modeled in a row-wise, relational format. When two data sets share a common field, those data sets can be combined in a procedure called a join. A join combines the data of two data sets into one data set that is often bigger than the initial two data sets independently occupied. In fact, this new data set is often so much bigger that it creates problems for the machine learning engineers.

Arun Kumar is an assistant professor at UC San Diego. He joins the show to discuss the modern lifecycle of machine learning models, and the gaps in the tooling.

Arun’s research into improving processing of joined data sets has been adopted by companies such as Google. Some of that research has been adapted into open source machine learning tools that improve the performance of machine learning jobs with minimal code required.


Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Software Daily

Software Daily

Subscribe to Software Daily, a curated newsletter featuring the best and newest from the software engineering community.