Podcast: Play in new window | Download
Querying a search index for objects similar to a given object is a common problem. A user who has just read a great news article might want to read articles similar to it. A user who has just taken a picture of a dog might want to search for dog photos similar to it. In both of these cases, the query object is turned into a vector and compared to the vectors representing the objects in the search index.
Facebook contains a lot of news articles and a lot of dog pictures. How do you index and query all that information efficiently? Much of that data is unlabeled. How can you use deep learning to classify entities and add more richness to the vectors?
Jeff Johnson is a research engineer at Facebook. He joins the show to discuss how similarity search works at scale, including how to represent that data and the tradeoffs of this kind of search engine across speed, memory usage, and accuracy.
Notes: Jeff’s blog post about similarity search