Deep Learning Topologies with Yinyin Liu

Algorithms for building neural networks have existed for decades. For a long time, neural networks were not widely used. Recent changes to the cost of compute and the size of our data have made neural networks extremely useful. Our smart phones generate terabytes of useful data. Lower storage costs make it economical to keep that data. Cloud computing democratized the ability to do large scale machine learning across deep learning hardware.

Over the last few years, these trends have been driving widespread use of deep learning, in which neural nets with a large series of layers are used to create powerful results in various fields of classification and prediction. Neural networks are a tool for making sense of unstructured data–text, images, sound waves, and videos.

“Unstructured” data is data with high volume or high dimensionality. For example, an image has a huge collection of pixels, and each pixel has a color value. One way to think about image classification is that you are finding correlations between those pixels. A certain cluster of pixels might represent an edge. After doing edge detection on pixels, you have a collection of edges. Then you can find correlations between those edges, and build up higher levels of abstraction.

Yinyin Liu is a principal engineer and head of data science at the Intel AI products group. She studies techniques for building neural networks. Each different configuration of a neural network for a given problem is called a “topology.” Engineers are always looking at new topologies for solving a deep learning application–such as natural language processing.

In this episode, Yinyin describes what a deep learning topology is and describes topologies for natural language processing. We also talk about the opportunities and the bottlenecks in deep learning–including why the tools are so immature, and what it will take to make the tooling better.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Sponsors


Segment allows us to gather customer data from anywhere and send that data to any analytics tool. Segment is the customer data infrastructure that has saved us from writing duplicate code across all of the different platforms that we want to analyze. And if you’re using cloud apps such as – Mailchimp, Marketo, Intercom, AppNexus, Zendesk–you can integrate with all of these different tools and centralize your customer data in one place–with Segment. To get a free 90-day trial, signup for Segment at segment.com and enter SEDaily in the “How did you hear about us box?” during signup.


Azure Container Service simplifies the deployment, management and operations of Kubernetes. You can continue to work with the tools you already know, such as Helm, and move applications to any Kubernetes deployment. Integrate with your choice of container registry, including Azure Container Registry. Also, quickly and efficiently scale to maximize your resource utilization without having to take your applications offline. Isolate your application from infrastructure failures and transparently scale the underlying infrastructure to meet growing demands—all while increasing the security, reliability, and availability of critical business workloads with Azure. Check out the Azure Container Service at aka.ms/sedaily.


LiveRamp is one of the fastest growing companies in data connectivity in the Bay Area, and they are looking for senior level talent to join their team. LiveRamp helps the world’s largest brands activate their data to improve customer interactions on any channel or device. The infrastructure is at a tremendous scale: a 500-billion node identity graph generated from over a thousand data sources, running an 85PB hadoop cluster; and application servers that process over 20 billion HTTP requests per day. The LiveRamp team thrives on mind-bending technical challenges. LiveRamp members value entrepreneurship, humility, and constant personal growth. If this sounds like a fit for you, check out softwareengineeringdaily.com/liveramp.



GoCD is a continuous delivery tool created by ThoughtWorks. GoCD agents use Kubernetes to scale as needed. Check out gocd.org/sedaily and learn about how you can get started. GoCD was built with the learnings of the ThoughtWorks engineering team, who have talked about building the product in previous episodes of Software Engineering Daily. It’s great to see the continued progress on GoCD with the new Kubernetes integrations–and you can check it out for yourself at gocd.org/sedaily.