Model Training with Yufeng Guo

Machine learning models can be built by plotting points in space and optimizing a function based off of those points.

For example, I can plot every person in the United States in a 3 dimensional space: age, geographic location, and yearly salary. Then I can draw a function that minimizes the distance between my function and each of those data points. Once I define that function, you can give me your age and a geographic location, and I can predict your salary.

Plotting these points in space is called embedding. By embedding a rich data set, and then experimenting with different functions, we can build a model that makes predictions based on those data sets. Yufeng Guo is a developer advocate at Google working on CloudML. In this show, we described two separate examples for preparing data, embedding the data points, and iterating on the function in order to train the model.

In a future episode, Yufeng will discuss CloudML and more advanced concepts of machine learning.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Sponsors


Simplify continuous delivery with GoCD, the on-premise, open source, continuous delivery tool by ThoughtWorks. With GoCD, you can easily model complex deployment workflows using pipelines and visualize them end-to-end with the Value Stream Map. You get complete visibility into and control of your company’s deployments. At gocd.org/sedaily, find out how to bring continuous delivery to your teams. Say goodbye to deployment panic and hello to consistent, predictable deliveries. Visit gocd.org/sedaily to learn more about GoCD. Commercial support and enterprise add-ons, including disaster recovery, are available.


Digital Ocean Spaces gives you simple object storage with a beautiful user interface. You need an easy way to host objects like images and videos. Your users need to upload objects like pdfs and music files. Digital Ocean Spaces is modern object storage with a modern UI that you will love to use–it’s like the UI for Dropbox, but with the pricing of a raw object storage; I almost want to use it like a consumer product. To try Digital Ocean Spaces, go to do.co/sedaily and get 2 months of Spaces plus a $10 credit to use on any other Digital Ocean products–and you get this credit even if you have been with Digital Ocean for awhile. It’s a nice added bonus just for trying out Spaces. If you become a customer, the pricing is simple:  $5 per month price and includes 250GB of storage and 1TB of outbound bandwidth. There are no costs per request and additional storage is priced at the lowest rate available: $0.01 per GB transferred and $0.02 per GB stored. There won’t be any surprises on your bill. Digital Ocean simplifies the cloud–they look for every opportunity to remove friction from a developer’s experience. I love it, and I think you will too–check it out at do.co/sedaily.


The octopus: a sea creature known for its intelligence and flexibility. Octopus Deploy: a friendly deployment automation tool for deploying applications like .NET apps, Java apps and more. Ask any developer and they’ll tell you it’s never fun pushing code at 5pm on a Friday then crossing your fingers hoping for the best. That’s where Octopus Deploy comes into the picture. Octopus Deploy is a friendly deployment automation tool, taking over where your build/CI server ends. Use Octopus to promote releases on-prem or to the cloud. Octopus integrates with your existing build pipeline–TFS and VSTS, Bamboo, TeamCity, and Jenkins. It integrates with AWS, Azure, and on-prem environments. Reliably and repeatedly deploy your .NET and Java apps and more. If you can package it, Octopus can deploy it! It’s quick and easy to install. Go to Octopus.com to trial Octopus free for 45 days. That’s Octopus.com


Who do you use for log management? I want to tell you about Scalyr, the first purpose built log management tool on the market. Most tools on the market utilize text indexing search, which is great… for indexing a book. But if you want to search logs, at scale, fast… it breaks down. Scalyr built their own database from scratch: the system is fast. Most searches take less than 1 second. In fact, 99% of their queries execute in <1 second.  Companies like OKCupid, Giphy and CareerBuilder use Scalyr. It was built by one of the founders of Writely (aka Google Docs). Scalyr has consumer grade UI, that scales infinitely. You can monitor key metrics, trigger alerts, and integrate with PagerDuty. It’s easy to use and did we mention: lightning fast. Give it a try today. It’s free for 90 days at softwareengineeringdaily.com/scalyr.

 

  • Pingback: Dew Drop - October 18, 2017 (#2584) - Morning Dew()

  • Rengarajan Bashyam

    Machine learning has become synonymous with “Make a dumb but deep network, throw a LOT of data at it and pray it will learn”. Its similar to my learning golf: I throw a lot of money at it, hoping that i will get it one day. Humans don’t need to see 25,000 cats and dogs to start identifying which is a cat and which is a dog. Can networks learn with frugal data or minimal samples? Can we learn a new diagnosis of cancer with only a handful of x-Rays and not 30,000 x-Rays? Then only it can be called intelligence of any use, artificial or otherwise. The AI today is nothing but simple algorithms on steroids. Whats your opinion on this? – Renga

    • Jeff Meyerson

      I think machine learning is just teaching computers to improve their systems as they receive more data.

      We should simplify it as much as possible so that programmers are not intimidated by it.