Determined AI: Machine Learning Ops with Neil Conway

Developing machine learning models is not easy. From the perspective of the machine learning researcher, there is the iterative process of tuning hyperparameters and selecting relevant features. From the perspective of the operations engineer, there is a handoff from development to production, and the management of GPU clusters to parallelize model training.

In the last five years, machine learning has become easier to use thanks to point solutions. TensorFlow, cloud provider tools, Spark, Jupyter Notebooks. But every company works differently, and there are few hard and fast rules for the workflows around machine learning operations.

Determined AI is a platform that provides a means for collaborating around data prep, model development and training, and model deployment. Neil Conway is a co-founder of Determined, and he joins the show to discuss the challenges around machine learning operations, and what he has built with Determined.

Sponsorship inquiries: sponsor@softwareengineeringdaily.com

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.


Sponsors

JFrog Container Registry is a comprehensive registry that supports Docker containers and Helm Chart repositories for your Kubernetes deployments. It supports not only local storage for your artifacts, but also proxying remote registries and repositories, and virtual repositories to simplify configuration. Use JFrog Container Registry to search the custom metadata of your repositories. To find out more about JFrog Container Registry, visit softwareengineeringdaily.com/jfrog

strongDM lets you manage and audit access to servers, databases, and Kubernetes clusters, no matter where your employees are. With strongDM, you can easily extend your identity provider to manage infrastructure access. You can automate onboarding, offboarding, and moving people within roles. strongDM. Manage and audit remote access to infrastructure. Start your free 14 day trial today at: strongdm.com/SEDaily

It’s hard to get engineering resources to build back-office apps, and even harder to get engineers excited about maintaining them. The idea is that all internal tools kinda look the same – they’re made of tables, dropdowns, buttons, text inputs, etc. Retool gives you a drag and drop interface so engineers can build these internal UIs in hours, not days, and spend more time building features customers will see. Visit retool.com/sedaily to learn more.

With Triplebyte, you do one online interview, and then you get to go straight to final interviews at hundreds of companies (from tech giants like Dropbox to exciting startups). It’s like the Common App for software engineers. No resume needed. Apply now at triplebyte.com/sedaily. If you take a job through Triplebyte, you’ll get a $1000 signing bonus.

Software Weekly

Software Weekly

Subscribe to Software Weekly, a curated weekly newsletter featuring the best and newest from the software engineering community.