Podcast: Play in new window | Download
The easiest way to train a computer to recognize a picture of cat is to show the computer a million labeled images of cats. The easiest way to train a computer to recognize a stop sign is to show the computer a million labeled stop signs.
Supervised machine learning systems require labeled data. Today, most of that labeling needs to be done by humans. When a large tech company decides to “build a machine learning model,” that often requires a massive amount of effort to get labeled data.
Hundreds of thousands of knowledge workers around the world earn their income from labeling tasks. An example task might be “label all of the pedestrians in this intersection.” You receive a picture of a crowded intersection, and your task is to circle all the pedestrians. You have now created a piece of labeled data.
Scale API is a company that turns API requests into human tasks. Their most recent release is an API for labeling data that has been generated from sensors. As self-driving cars emerge onto our streets, the sensors on these cars generate LIDAR, radar, and camera data. The cars will interpret that data in real time using their machine learning models, and then they will send that data to the cloud so that the data can be processed offline to improve the machine learning models of every car on the road.
The first step in that processing pipeline is the labeling–which is the focus of today’s conversation. Alexandr Wang is the CEO of Scale, and he joins the show to discuss self-driving cars, labeling, and the company he co-founded.
A few notes before we get started. We just launched the Software Daily job board. To check it out, go to softwaredaily.com/jobs. You can post jobs, you can apply for jobs, and it’s all free. If you are looking to hire, or looking for a job, I recommend checking it out. And if you are looking for an internship, you can use the job board to apply for an internship at Software Engineering Daily.
Also, Meetups for Software Engineering Daily are being planned! Go to softwareengineeringdaily.com/meetup if you want to register for an upcoming Meetup. In March, I’ll be visiting Datadog in New York and Hubspot in Boston, and in April I’ll be at Telesign in LA.
Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.