Machine Learning Joins with Arun Kumar
Data sets can be modeled in a row-wise, relational format. When two data sets share a common field, those data sets can be combined in a procedure called a join. A join combines the data of two data sets into one data set that is often bigger than the initial two data sets independently occupied. In fact, this new data set is often so much bigger that it creates problems for the machine learning engineers.
Arun Kumar is an assistant professor at UC San Diego. He joins the show to discuss the modern lifecycle of machine learning models, and the gaps in the tooling.
Arun’s research into improving processing of joined data sets has been adopted by companies such as Google. Some of that research has been adapted into open source machine learning tools that improve the performance of machine learning jobs with minimal code required.
Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
With MongoDB Atlas, you can take advantage of MongoDB’s flexible document data model as a fully automated cloud service. MongoDB Atlas handles all the costly database operations and admin tasks that you’d rather not spend time on, like security, high availability, data recovery, monitoring, and elastic scaling.Try MongoDB Atlas for free today! Visit mongdb.com/se to learn more.
Bitbar tests your app on real devices–no emulators or virtual environments. Bitbar has real Android and iOS devices, and the Bitbar testing tools integrate with Jenkins, TravisCI, and other continuous integration tools. Check out bitbar.com/sedaily and get a free month of unlimited mobile app testing.
Digital Ocean is the easiest cloud platform to run and scale your application. Try it out today and get a free $100 credit–go to do.co/sedaily. Digital Ocean is a complete cloud platform to help developers and teams save time when running and scaling their applications.