Machine Learning Joins with Arun Kumar
Data sets can be modeled in a row-wise, relational format. When two data sets share a common field, those data sets can be combined in a procedure called a join. A join combines the data of two data sets into one data set that is often bigger than the initial two data sets independently occupied. In fact, this new data set is often so much bigger that it creates problems for the machine learning engineers.
Arun Kumar is an assistant professor at UC San Diego. He joins the show to discuss the modern lifecycle of machine learning models, and the gaps in the tooling.
Arun’s research into improving processing of joined data sets has been adopted by companies such as Google. Some of that research has been adapted into open source machine learning tools that improve the performance of machine learning jobs with minimal code required.
Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.