EPISODE 1629

[INTRODUCTION]

[0:00:00] ANNOUNCER: Algolia is a platform that provides search as a service. The company was founded in 2012 and was part of Y Combinator's Winter 2014 Class and has become highly popular for integrating modern search functionality into web-facing services. 

Sean Mullaney is the CTO of Algolia and has worked at Google X, Stripe and Zalando. He joins the show today to talk about Algolia, neural search, vector compression, search optimization and more. 

This episode of Software Engineering Daily is hosted by Sean Falconer. Check the show notes for more information on Sean's work and where to find him. 

[INTERVIEW]

[0:00:45] SF: Sean, welcome to the show.

[0:00:46] SM: Hey, thank you so much. I'm excited to be on. I'm a big fan of the podcast. 

[0:00:49] SF: I'm excited to have you here. It's always great, of course, to talk to another Sean. Also, a fellow Google alumni. We have Sean squared plus Google squared at this point today. 

[0:00:59] SM: Fantastic. 

[0:01:00] SF: You've spent a lot of your time throughout your career working in AI at Google. And you had time at Stripe optimizing sort of the end of the buyer's journey, through the payment flow. Now you're the CTO of Algolia. Can you talk a little bit about your e-commerce and AI journey?

[0:01:16] SM: Yeah. Absolutely. I've been in e-commerce now for almost 20 years and have really been able to see I guess the end-to-end journey. When I was at Google, obviously, the original search and advertising, I got to work in a lot and saw really Google go through the transformation of becoming one of the first kind of AI-first companies. 

I remember when we were there, the whole company went through AI training during the first wave of AI back in 2014, 2015. I got to work in the Research Labs, Google X, which is really exciting seeing a lot of the advancements in AI around drones, which is the project I worked on. As well as self-driving cars, and glasses and things like this. 

But then I left Google and worked at one of the biggest ecom companies in Europe. A company called Zalando that has about $15 billion worth of sales every year. And what was really exciting is, is that Zalando was really built from a data-first, engineering-first company. And we're really pushing the limits on how AI was being used both in search, and recommendations, and kind of attribute enrichment and all sorts of ways. And so, it was really exciting being on the inside of an e-commerce company and seeing how it operated. How the business worked and how AI was helping kind of drive a lot of value. 

I had the opportunity to go and work at Stripe as well. And Stripe for me has always been one of the greatest developer experiences and help build up the European payments engineering team. And so, when I had the opportunity to join Algolia as the CTO, it was really like combining a lot of those experiences together. Trying to figure out how to use AI and search, but also just creating a great developer experience. 

I like to say that we are like the Stripe for search. Search is just a foundational part of every digital application. The same as payments. And making it really easy for developers to get an incredible search experience up and running fast is I think just a really important part about anyone building a business or a product these days. 

[0:03:04] SF: Yeah. Absolutely. What has - historically looking back at your time in search, how has AI been used to improve search at different parts of essentially the search experience? 

[0:03:15] SM: You got to look back at I guess the late 90s when the first search engines kind of appeared. And a lot of the technology back then was very simple. They took web pages and just took all the words in those web pages. And then when you typed in a query, it would go through all the web pages and find documents and websites that has the same words that you were typing in. This kind of like keyword search or keyword matching has been the kind of core technology behind search for the last 20 years. 

And to be honest, most digital experiences you have these days when you type in words in the search bar, that's what they're doing. They're matching the keywords that you type into the keywords that are in the documents or the products on the ecom site. 

We've started to add AI around the edges of search. There's been some AI to kind of like insert synonyms into the query when you type it. Or there have been ways in which you can kind of rerank the results using some AI looking at kind of popularity. But in essence, that core kind of like information retrieval technology that was underlying, it was always just based on keywords.

[0:04:18] SF: Yeah. You mentioned basically doing some form of query expansion as a way. Because we didn't have a great way for the underlying search technology to actually understand one word could mean something else. You need to do that query expansion part at the beginning in order to get a better search result.

[0:04:33] SM: Yeah. And a lot of these strategies weren't even really AI. I mean, a lot of them were just statistical kind of methods or look-ups. You'd have like a dictionary or a set of synonyms. And when you weren't able to find enough results, you would just put some more synonyms in there. Expand the query out. And that's great for expanding recall. Getting more products. But it kind of hurt the precision of the results that were being returned. 

Yeah. We've been trying to use AI and statistical methods around the edges. But we've never really had a way to use AI to really understand natural language until the last kind of year when the power of LLMs had reached a point where they could really kind of engage users and meet the expectations around what natural language would be. 

I really see in the search space the last year or two has been a huge transformation in terms of what's possible with search in natural language. 

[0:05:22] SF: Yeah. Absolutely. I mean, I think over the last 20 years, we've, as people who use and interact with technology, interact with Google search and so forth, we've like programmed ourselves in how to actually use those tools to search. If I want to know the restaurants in San Francisco, I type in San Francisco restaurants. But I would never go to you and be like San Francisco restaurants. You'd be like, "Who is this guy? He's kind of rude." But now with an LLM, we can actually ask a question. Like, "Please, can you tell me the restaurants in San Francisco." The way that I might ask you. And you can actually get like a human-readable response. For the first time, we can kind of speak normal to the technology. And rather than getting back 100,000 links, we're getting a human-readable answer. 

I mean, how do you think that this is going to impact consumer expectation? About product experience? If we could suddenly talk to a computer like a human and receive a human-like response, there's a big transformation in the way that we've historically kind of interacted with machines. 

[0:06:17] SM: Yeah. You're absolutely right. And we've had some fits and starts here. Because do you remember when Alexa and a lot of these voice devices were - 

[0:06:25] SF: Oh, yeah. I worked on Google Assistant. I know about it.

[0:06:28] SM: Exactly. Yeah. It was like the future was always going to be in natural language and in voice. But the problem was is that we don't want to bark keywords at our Alexa speaker, right? We actually want to be able to talk normally. And unfortunately, kind of natural language understanding and search hadn't developed at that stage. 

And so, a lot of people just realized, "You know what? I can't really talk to this thing and get answers back. I'm just going to go back to my keywords." Now we bark at our Google Assistant rather than really talking to it. There are a lot of like incredible new applications that I think are finally going to become much more popular. 

And I'll tell you as well, we absolutely have been trained. We've been trained to treat websites like they're kind of databases with a user experience on top. E-commerce in particular has been incredible at expanding the reach and choice for customers. But a lot of the websites still look like Amazon back in the late 90s. They still look like a warehouse that has a website attached to them. And both the experience of using these sites and its ability to really understand what you're looking for hasn't changed a lot. 

And one thing that we have seen in the last year since chat GPT came on the market. We've seen the length of queries that people are using on our sites double. People are no longer - their expectations have changed since ChatGPT that they're actually expecting to be able to go and ask longer form queries and to express themselves more. Hopefully, people are going to start having much higher bar of expectation of how they can interact with these websites. 

[0:07:56] SF: Yeah. I felt even that transformation shift myself, sometimes I'll go to - I'll start writing a query on Google. And if it gets past five or six words, I'm like, "Oh, never mind. I'm just going to go to ChatGPT and ask my question." Because I know that it's going to be too long and too specific to like really use sort of traditional information retrieval to get something satisfying as a response. 

[0:08:18] SF: Yeah. And keyword starts to break down the more keywords you're using as well. Because obviously, the more keywords you have, the more documents are going to get matched. And figuring out which of those documents are really the best ones. Not every document will have every keyword. Actually, the quality and precision of your results starts to degrade the more keywords you use, which is why we're trained not to use that many, right? 

[0:08:39] SF: Yeah. Exactly. One of the problems that when it comes to using LLMs for search that we need to be able to solve, we have to solve the vector similarity problem in potentially a massive scale. And we're essentially trying to embed human language in the vectors. It's very hard to compare vectors in high-dimensional space. What techniques are available for helping solve that problem at a performance that's acceptable for production systems? 

[0:09:05] SM: Yeah. And this has been really the Holy Grail of vector search and natural language. We've known that vector search or semantic search using LLMs and representing words as vectors has been a much better way to gain precision. 

And word2vec came out from Google what? In 2012? In 2013 or something. There's been a lot of research around this area. But a lot of that research has been stuck in the lab, honestly. Because the actual scaling this out has hit all sorts of barriers around vectors are extremely large to store in-memory. And to have in-memory research that is fast enough to meet expectations, you end up having to have very, very, very large machines, which gets expensive. 

Being able to compare high-dimensional vectors is also computationally extremely expensive and therefore slow as well. While folks have known how to vectorize the limitations of scaling this out into kind of commercial production systems, where on Black Friday, we hit 100,000 queries per second. And we serve almost 2 trillion queries a year. When you're operating at that scale and your expectations are 20 milliseconds is the speed you need to be able to hit, it's very hard to do a lot of these like very big computations. It's very hard to have a lot of machines in-memory. We're always looking at the kind of like quality speed and cost tradeoffs of these technologies. And to date, we haven't been able to get those tradeoffs to work. 

One of the things that we have invented at Algolia is a compression technology. We do something called hashing on the vectors where we're able to reduce the dimensionality and compress them into much smaller space. And the second thing is the way that we compress them is a way in which we can put them into traditional databases and use high-speed comparison that normal databases use. 

There's a very, very specific technique that you have to be able to use in order to get the same kind of database-like performance, but be able to capture almost 100% of the information, knowledge that's coming out of these vectors. This is what we call neural search is our kind of high-performance, highspeed vector search engine. 

[0:11:10] SF: Mm-hmm. And then for that compression technique using hashing, is that a lossless or lossy compression? Are you losing some information by performing that hashing operation? 

[0:11:22] SM: Yeah. The thing about vector spaces is it's absolutely enormous, which is why these vectors are very large. But it's also almost entirely empty as a kind of efficient way of storing these concepts. The trick is to be able to do the vector compression where you're not losing information or context and having the tricks to be able to kind of cluster the vectors and be able to kind of do sub-searching and things like this. 

But yeah, it's almost entirely lossless. And most importantly, the output of it. The way in which the hashes are created means that hashes that are very similar bitwise are actually very similar in vector space. And getting the right kind of output format so that you can use kind of a lot of the kind of high-performance of CPUs to be able to make that comparison is pretty important. 

[0:12:09] SF: Yeah. You can rely on sort of more traditional similarity measurements rather than using sort of vector-specific similarity measurements, like the cosine of the angle, or the length of the vectors, or the distance in space. You're relying on something that's more of like a bitwise comparison on the hash level. 

[0:12:26] SM: Yeah, exactly. Yeah, getting the right compression down so that you get to keep the information and are able to look up very large amounts of data even using just kind of traditional inverted indexes, which is what most of the keyword search engines are built off of. Yeah, that's really the trick to it. And it's taken years' worth of research to get there. 

But we knew that we couldn't launch a vector search product unless it was extremely fast and could scale to very large indexes and could perform on traditional CPUs at reasonable cost. That's the kind of exciting part about practical computer science is not just the breakthrough, but actually figuring out how to scale it up with all of the constraints.

[0:13:04] SF: Mm-hmm. Yeah. And if you're relying on traditional CPUs, and traditional similarity measurements and so forth, plus reducing essentially the size of the data so you can rely - you don't need necessarily a vector database to use this. You can use traditional databases. Then your total cost of infrastructure is probably significantly reduced as well and probably the operational complexity around it and the number of people that you need to manage it. 

[0:13:28] SM: Yeah. Absolutely. And the faster you can make the retrieval stage as well, it gives you more latency budget to be able to do reranking afterwards too. When you go into search, you have like three main stages. One is you try to understand the query itself and structure the retrieval phase. Secondly, you want to go and do keyword and these vector search. But then the third stage is about reranking the results that come back so that the order of the results provides the best kind of relevance to the customer. 

Because as you know, when you're on the website, the first and second results are clicked on so much more often than the 10th or the 20th result. When you can do the retrieval phase very fast, it gives you more latency budget to be able to then go and do more sophisticated reranking.

[0:14:10] SF: Mm-hmm. Against a smaller set of data.

[0:14:12] SM: Yes. Against a smaller set of data. Yeah. 

[0:14:14] SF: Mm-hmm. And then you mentioned neural search. This is like the underlying technology that's been developed there that's also using this hashing approach. Are you using a combination of both sort of a representation of the vector in the form of the hashing, but also keyword-based matching? 

[0:14:30] SM: Yeah. Absolutely. Keywords still work very well. And a lot of customers are still in the trained mode of using keywords. When you've got some pretty basic keyword queries, we call this the kind of the head of your query space, you still want to be using keywords. It's still a very important signal. 

The way we look at it is keyword provides a signal of relevance. Vector search provides a signal of relevance. And when you combine those two with other signals, you can have a much better precision when you re-rank the results. We're getting like more and more information about whether this individual record is relevant to the query. 

[0:15:04] SF: Can you walk me through maybe like an example of how you're sort of combining these different approaches? Bringing the world of keyword search and vector search together? 

[0:15:11] SM: Yeah. It's exciting to geek out on this. All right. What we're trying to do is we're trying to create, for a given set of records that we think are relevant for a user, a whole bunch of different signals. The first thing we do is we go and get the query. We turn it into a vector. We then do retrieval against the keywords and the vector across very large indices. 

These could be millions and millions of records. We pull out the top thousand based on keyword and vector rankings. And then what we can do is we can pull in additional information about each of these records for the final reranking stage. And the final reranking stage can take into account popularity. So, how many people clicked on this product after they searched for it before? It can take into consideration like conversion rates on an ecom site. It can take into account business metrics. 

And then we have a layer of personalization where, because we know the user and we have a profile on them, we can actually score each of the records for personalization. And then there's a final AI model, which takes all of this data and then decides a final ranking. This is like a learn to rank model. We have kind of AI understanding the queries AI at the retrieval phase and then AI at the final ranking phase as well using all of these different signals. 

[0:16:24] SF: And then you're applying essentially, I'm assuming, different types of AI at each level. Maybe more traditional like what machine learning predictive models for some parts of this as well as taking advantage of LLMs as part of this for the vector similarity search piece.

[0:16:39] SM: Yeah. Exactly. Each stage has a different model. And different models solving different parts of the problem. For example, at query understanding, we have a categorization model. And let's say you're at a grocery store and you're searching for chocolate milk. It will understand - because we have this AI model that's actually part of the dairy category. And it will then filter the results of retrieval based on the category it's predicting. 

And then if you type in milk chocolate, it'll say, "Oh, actually that's confectionary." And so, it will then filter for confectionary results. We have these very specific models at each of the different stages that's performing kind of more domain-specific jobs. And then they all combine together into the whole search pipeline. 

[0:17:26] SF: Is my history of search taken into account as well? A lot of times when we're doing searching, we're doing it sort of like within one session where I might search something and then I like, "Oh, I don't really see what I want." And then I refine it and I do another search or another search and so forth. 

[0:17:39] SM: Yeah. You can take into account previous searches with these models. What we find though is, is that the personalization layer is one of the things that provides a lot of the kind of history where a search gets better the more you use a site. 

For every single thing that you click on, every query that you put in, everything you add to basket, everything you buy builds up a profile. And so, for every single user, we have categories, brands, price points. And we use an AI model to create affinities so that we can figure out in this session what's your affinity for buying power. Pricing points. What brands would you most typically be gravitated towards? And then we can use those signals to re-rank results as you're going through the site.

[0:18:24] SF: Yeah. I think you mentioned personalization there. I think one of the things you talked about earlier as well is how traditionally e-commerce sites are kind of like a UI on top of a warehouse. And I think one direction that we're going to go with all this AI talk technology is a lot more personalization where our UI kind of adapts to the types of things. Like, what our needs are? There's no longer going to be a choice of, "Oh, I want the, I don't know, the simplistic UI or the more complicated UI." It'll essentially learn to adapt itself based on what I need as an individual.

[0:18:59] SM: Oh, yeah. Absolutely. I think one of the things that frustrates me the most about the experience is when you go to browse an e-commerce site and they drop 10,000 products in a grid and it's like 100 pages long, right? And you sit there, you get a big grid of products. And you got to like click through all the different pages. But that experience itself, it's very unlike most other experiences we have. 

We have this product we're building around AI browse. As you're browsing a website, instead of seeing like a big grid of products, we're giving a Netflix-like experience. When you go to Netflix, they don't just like give you the whole grid of all of the shows. They have I call the chessboard, right? They have rows. And every row is highlighting a different aspect of the catalog. It may be like these are mystery crime thrillers, right? Or these are the top shows in your area. 

And so, what we think is a lot of the future around browsing online is being able to dynamically create interesting and inspirational categories of products to showcase the whole breadth of what's on offer on the store. So, that as a user, you can very quickly understand, "Oh, wow. Okay. This is all the different types of things that the store has to offer. And here's why I might like some of them. Versus other ones." It's not just search. It's also the way you browse and explore a website that I think AI has a lot of opportunity to help with. 

[0:20:18] SF: Where do you see some of the other sort of practical use cases of generative AI particularly in the e-commerce space? We talked a little bit about search. You mentioned sort of this adaptive UI for browsing. Are there other types of places where we could improve the experience for end users? 

[0:20:32] SM: Yeah, absolutely. And so far, all of the things we've talked about about AI-powered search, none of that is using the kind of generative aspect of LLMs. It's using the kind of encoding part of it, but not the decoding and talking back to you. Actually, we think that chat-based search is going to be a big part of the e-commerce experience going forward. 

And one of the reasons is a lot of ecom stores are very domain-specific. If you go to a sports retailer, or a fashion store, or an electronic store, it's quite likely that you're not an expert in these products. And that means that the terms that you're searching for are probably not going to match the keywords. Because you don't know what the technical lingo is on the electronics site. 

We actually think that a lot of customers will like assisted shopping. Just like when you walk into a retail store and the electronics genius talks to you about all the technology like you're 5-years-old and can help you choose, we think that generative AI can help create those conversations. So, you can actually talk to an assistant. They can be showing you products in the search pages as you're talking. They can help filter it. And then they can explain to you what different technologies are. Let's say you, "What the hell is an OLED TV?" Right? It can walk you through the different expertise and then give you advice. Like, "Hey, if you buy this product, if you think this is important. Or this product, if you think this is important. Or this one has great reviews." 

A lot of the e-commerce experience is about giving the customer the confidence to make a purchase. And I think generative AI is going to create a really kind of human-like experience when you're shopping on these sites where you actually feel like someone's helping you and giving you the confidence to be able to check out and buy. 

[0:22:18] SF: Yeah. It's like instead of doing the GitHub Copilot, you're getting the shopping co-pilot. Because I also think that there's some really helpful that you can get from interacting with an AI especially when you're talking about like a product area that you might not be familiar with. Because you might feel a little bit shy if you had to ask like a human expert. Because you feel dumb basically. There's a certain amount of psychological safety that you have with asking your questions to a nonjudgmental AI where they can just tell you and you can ask follow-up questions. You don't have to fear like, "Oh, this person's going to think that I don't know what I'm talking about when I don't know what I'm talking about." But everyone kind of has that fear from time to time.

[0:22:56] SM: Exactly. And not only that, but if you shop at the site multiple times, when you come back the next time, they'll remember the conversation that they had with you before. It'll be like having a personal shoper that remembers you when you walk in every time.

[0:23:09] SF: Yeah. And I think also, I know from my time doing a little bit of work on like Google shopping, that a lot of times people will abandon a purchasing experience online especially when it comes to like clothing or apparel or something that. Because they don't know, "Oh, do these Nikes run big or not?" And they need to ask that question basically. 

Having a tool right there in your shopping experience where you can actually ask that question and will probably lead to more conversions for the business. But also, it's a more satisfying shopping experience, because now I don't have to go necessarily out to a store to try these things on.

[0:23:42] SM: Yeah. I mean, that's one of the reasons why ecom is still kind of less than one in five. It's about 20% market share. Traditional retail is still very strong. And I think a lot of it is because folks want to have that guided, assisted experience. They want to get the confidence they need that the product's going to be the right one for them. And so, I do think the generative can have a big breakthrough in terms of pushing ecom and the amount of business that happens online to the next level.

[0:24:10] SF: In terms of search in e-commerce, is the type of - or the range I guess of queries that people enter into e-commerce sites for search different I guess than general search? And how does that potentially impact the types of I guess the problem space? Does it help by constraining the problem space because you're talking particularly about e-commerce sites? Or does it introduce those types of problems that you actually need to solve for? 

[0:24:33] SM: Yeah. I mean, I think there are different parts of the kind of search space in an e-commerce site, for example. Some people use the search box as a shortcut to browse. They'll enter in like jeans or something just to find the jeans category. And those are relatively straightforward to be able to redirect users to. 

Then sometimes people want to get very specific filtering through their keywords. You have to be able to detect and recognize filters, and categories and things. Someone will use a brand name. They'll use a color. They'll use a category. And they'll string these together. And they're in essence just trying to create a filter browse experience. That's a kind of second part of the use case. 

Thirdly, a lot of people will come in and they will ask about problems that they want to solve, right? That's why a lot of people shop online. They got a problem. I want to solve it. 

We work with a pharmacy, for example, and the number of people that will just search for baby crying, right? Which is not a product. You can't buy that product. But it is a problem that they want solved. And they don't know the word colic or they can't spell it. But that's the kind of medicine that they're looking for. And that's a great example about how LLMs and natural language understanding can - in vector space, baby crying and colic are very, very close together. Our vector engine is able to find those products. Whereas keyword search would fall down. 

And then there are all sorts of other problems that customers come with. They want a brand. They're searching for UGG boots, but the site doesn't sell UGG boots. And so, our LLMs can figure out, when they say the word UGG, they actually mean furry boots, right? And we can show products from other vendors. There's a whole set of different problems and jobs to get done that customers are looking for ecom sites that you have to kind of just like knock through one by one.

[0:26:20] SF: Yeah. I would think there's also a special type of context. Because if I'm searching a site that only sells boots, then there's probably words that are going to be very specific to the world of boots. And also, descriptors of that that make sense when you're putting those vectors in space that maybe don't make sense as much at least if you're doing general search. Because those things would be farther apart in space because they just mean different things in the general context. 

[0:26:44] SM: Yeah. And we fine-tune our LLMs. We have ecom data sets and we fine-tune them on top of the general language models. They actually do understand a lot of the domain-specific words that may be are more infrequent in the general corpus of training data that these LLMs have been trained on.

[0:27:02] SF: There's clearly certain types of searches where vector similarities are going be far superior to keyword base. But what about on the flip side? What are some of the queries where keywords are actually the better approach? 

[0:27:13] SM: Yeah. I mean, there are definitely things where keyword performs better. As I said, like really basic stuff, right? Imagine you went to ChatGPT and you just typed in the word jeans. Right? It'll come back and be like, "What? I don't really have that much information to go on." 

When you're looking at like kind of single keywords or two keywords, and particularly when those are keywords that are associated with kind of brands, or categories, or other filters, it's actually best just to redirect them straight to a browsing experience or a filtered browsing experience. 

And so, keyword is still a very important part of the experience, especially as customers have been trained to talk in keywords. We believe that this kind of hybrid search where you're able to use both signals run both engines at the same time and be able to use maybe vector to help like re-rank a little bit the keywords that come through. But what we're seeing is definitely customer behavior changing. 

And so, what a site that may see a lot of traffic that keyword can solve today is actually seeing that shift very quickly as customers start to change their behavior of what they're typing in and what they're asking. I think it's actually going to change pretty dramatically over the next couple of years as people come - their expectations are raised. That they can be themselves and express themselves directly. 

And also, I mean, I think voice search will start to take off as well when people realize that they can actually talk to these systems in natural language and be understood.

[0:28:36] SF: Yeah. I think that makes a lot of sense. As we talked about earlier, people like myself included are already changing our behavior to search. Some of the original keyword methods might be become less and less applicable over time because people are going to adapt essentially and have a different expectation when it comes to interacting with any kind of search box.

[0:28:54] SM: Yeah. Absolutely. 

[0:28:57] SF: What are some of the hard problems that are yet to be solved in the space? 

[0:29:02] SM: I think one of the really interesting things is when you think about your experiences with Bard or ChatGPT, a lot of use cases I've tried, the hallucination problem is real particularly when you're trying to find facts, figures, statistics. Things where there is a true answer. 

I've recently been trying to do some research and asking Bard about like if you ask it about a stock and you say what's the market cap, or what's the revenue, or you ask it what's the market size of this industry and who are the players? Bard will return you three answers and say which one's the best answer? And they're all different, right? It literally will hallucinate the numbers for you. 

And so, this is really part of the problem with relying on an LLM entirely. And so, what we think is that a lot of the future is what called retrieval augmented generation. Instead of it just actually trying to statistically predict what the answer should be without having the answer, imagine you went to an e-commerce website and you asked for a pair of jeans and Bard just hallucinates or invents a pair of jeans that doesn't exist in the real world. Or offers you a pair of genes you can't buy on the website. 

We do think that combining the power of these LLMs with an actual information retrieval system where you can actually pull out real web pages or real products and real items that exist. And then when you pull out of those items, you can then use them as an input to generate a conversation. 

For example, when you come to like an Algolia website you can have a search and then you could ask it like, "Hey, out of these products, can you walk me through which ones would be best for my use case?" And so, we think like combining the power of vector search with generation is the way in which you get like really accurate and kind of ring-fenced controlled answers where they can talk to you about real products that exist and real data that you're providing.

[0:30:52] SF: Yeah. You mentioned the numbers issue. I know it could be really hard also to like pull a quote, like an exact quote from somebody out of an LLM too. Because it wants to like change it basically. It's really hard to get the exact quote that someone said, which seems really fun when we're talking about computers. Because historically, it's like, well, they have perfect recall. And I guess they do. But they still want to change it in the context of an LLM. 

With the RAG model approach that you're talking about, how does that work as essentially someone who's using Algolia? Not the actual user, but someone who's implementing Algolia. How do they take advantage of combining the RAG model approach with the underlying LLM? 

[0:31:29] SM: Yeah. And this is what is powering - we have a pilot prototype of this kind of conversational commerce. When you come to a site that is enabled with this, instead of just having a free-form conversation with an LLM where they could hallucinate products, they could hallucinate features or prices, these kind of things, what we do is we take the conversation and we turn it into a vector. And then we retrieve a thousand different products. And we tell the LLM, "Only talk about these products. And only reference the pricing and brands in these products in the conversation you're about to have." 

And so, the actual chat is very much restricted to real things. And the LLM is restricted to not be able to like makeup prices, or features, or things that aren't provided in the context. I think a lot of applications are going to start to move to this kind of format where instead of just a totally free-form generative experience, it's restricted. And that there's like a retrieval phase that starts the conversation.

[0:32:26] SF: Yeah. Absolutely. I think we're starting to see more and more of that approach especially when it comes to like augmenting an LLM for training on my company's documents or something like that. You want a certain level of accuracy and you need to be able to provide that context window. What were some of the challenges with essentially building that out? You mentioned restricting to these thousand products that there's generally limitations in terms of how many tokens you can feed within the context window. How did you navigate some of those technical hurdles? 

[0:32:56] SM: Yeah. And again, it comes back to when I said practical computer science in software engineering, it's always a trade-off between cost, speed and quality. Obviously, you can feed in way loads of content into the context window and you'll have a higher accuracy conversation. But it's slower and it's going to cost a lot more money. And so, it's really about trying to get the balance and the tradeoffs there right. 

We know in the environment, for example, in ecommerce, that customers who are assisted in their purchase convert at a much higher rate and spend a lot more money. And so, that does offer the opportunity to say, "Yeah, if the conversation costs 50 cents, maybe that's worth it given the conversion uplift or the additional revenue we're likely to generate." You can actually in an e-commerce environment, for example, provide a lot more context. Take a little bit more time with those conversations. 

But in other business models, it may be that like really those kind of generative experiences need to be a penny each or even a tenth of a penny. And at that case, the amount of context you're providing is going to be a restrictive factor.

[0:34:00] SF: Yeah. Okay. That makes sense.

[0:34:02] SM: I think a lot of people who are building LLM products right now have the pH phal proof of concept that they get to show their management or their investors. And they're starting to realize that, "Oh, God. As I actually start to scale this out, and as I start to look at the cost of my infrastructure, and the latency and things like this, that actually it's not sustainable some of it." And that you have to really think about that type of latency cost and quality tradeoffs as you're building.

[0:34:27] SF: Yeah. I think that there's a lot of companies that are right now very interested in the space building out these proof of concepts. But it takes a lot of work to go beyond on demoware to actual production-level. Both from the scale challenges that you're going to reach and the cost perspective, but also dealing with potential privacy security issues. There's a lot to navigate and it's all a brand-new space for a lot of businesses. 

[0:34:52] SM: A great example is there's been a real surge and interest in vector native databases. We've seen like the Pinecones [inaudible 0:34:58] of the world have been very popular with developers. And what we're finding is, is that folks are using them for proof of concepts and getting great results. Because you got like the full vectors in-memory, right? And you've got a small set that you're working on. 

But they're finding that the bills start to pile up once you start to scale this out and it becomes extremely expensive. Or you can't scale behind the index size very far, right? It may work when you're doing your proof of concept. But you add a million records into the thing and the costs become prohibitive or the speed. 

Definitely, we're getting to maybe this trough of disillusionment where we're like, "Okay. This has been great in terms of the proof of concept and the demos. But how do we scale this up to the expectations of users in a real-world environment at the scale and speed they expect?" 

[0:35:45] SF: Yeah. And that's where I think one of the things that you've said a couple times is you have to get to the practical computer science, right? These methods might - they work great in the lab. But can you do this essentially at the scale of the requirements of like a larger organization or not? 

[0:36:01] SM: Yeah. And as a former Googler, you'll also appreciate that that's where a lot of the fun work at Google happened. It was not necessarily maybe the most externally exciting. But building incredible infrastructure that operated at speed and at low cost, it's such an important part about building any infrastructure company. But I think Google definitely had some of the focus on that in terms of practical computing.

[0:36:23] SF: Yeah. And I think that's where they were able to like really truly differentiate themselves. Because I mean, one, they were hitting scale problems way before probably any type of company was 15 years ago. Even then, most companies would be very lucky to hit the kind of scale issues that they were having. And they had to essentially develop that stuff and use a lot of really smart sort of practical computer science to be able to do that. But that also became a differentiator. because it's just an amazing experience as someone using search or any Google Maps I remember when it first came out in graduate school. And it was all using - everything before that was like Flash or these like Java Applets and these like really, really clunky ways of essentially loading apps or loading maps. And they were using just JavaScript and normal HTML. But they were able to do it at such scale. It was an incredible, amazing experience.

[0:37:13] SM: Yeah. What's incredible is how important speed is to us. We don't really understand it as a user. But we can see it in the data, which is every single millisecond over 100 milliseconds matters in response time. And there have been studies that shown anything over 100 millis with a search experience, there's like a 1 percentage point decrease in conversion rate for every 100 millis above that. 

And the reason is, is like we're hardwired as humans to want to get information fast. There's some fascinating research on this. It's a very important topic at Algolia, which is this idea that we were all foragers once when we were evolving. And we would very quickly try to understand as we were searching around for food which kind of patches had the highest energy gain for a certain amount of effort.

And so, our brains are very hardwired to understand this. And when we're searching online, we can go to a website, it's amazing how fast we are making a judgment call about the information gains we're getting from this website for a given amount of effort and speed. And people will bounce and leave websites very quickly if they don't feel like they're getting that information fast enough or if there's not enough information on the website that they see. 

And yeah, it's really fascinating. It's an evolutionary thing. And we're seeing it online that you increase the speed and you increase like the immediate amount of information gain you can give customers through either search or browse experiences. And they will stay. And then they will come back over and over again. Because their brain's like, "Hey, that was a place where I got a lot of information with a small amount of effort." 

[0:38:47] SF: Yeah. I recently interviewed the CTO of Sofascore, which is a big real-time gaming company based out of Croatia. And one of the big things that he talked about that was the difference between being able to respond in 20 milliseconds to somebody versus over 100 milliseconds. And it's just no one - especially when you're - it's basically a second screen as you're watching real-time sports. If there's clear latency, it's just a terrible experience for people. And they're going to bounce. 

And then I think in the e-commerce sense, that was going to lead to less conversions if you're trying to essentially get people to shop online. If I have to wait for the page to load, then I'm probably going to go somewhere else.

[0:39:23] SM: Yeah. And as we all know, these LLMs are not fast to generative expression at the moment. Figuring out how to scale them up with latency is a pretty hard problem at the moment.

[0:39:33] SF: Yeah. And I think that's going to be an area of differentiation. And I think that's where some of the people - like the open-source projects are struggling a little bit with actually going from that proof of concept demo. Where to production is can you do inference fast enough to satisfy your particular use case? 

[0:39:51] SM: Yeah.

[0:39:53] SF: This whole space is very new. And I think the world of AI/ML engineering is also less mature than conventional software engineering. I guess what are your thoughts on sort of the differences between software development and AI software development? And where are places that we need to improve essentially the tooling to do AI software development better? 

[0:40:14] SM: I mean, this is a really important topic that doesn't get talked about enough. In traditional software engineering, you are typically able to take a problem and decompose it into a technical specification, and scope out the various different pieces and plan the work in a way where you can kind of tell your boss or your customer, "Hey, this thing is going to ship on this date with this amount of effort." 

And you know what? You might be plus or minus 30% either way right if you've got a team that's good at this process. But with AI software development, it's not deterministic at all, right? That's the deterministic software development. 

When it comes to AI, it is a highly iterative and experimental process. We use a process hypothesis-driven where we have a whole board with hypothesis that we think can help improve models or solve problems. And we have to go get data to help like bolster this hypothesis. And then we implement it. We do offline testing. So, you have to test offline. Do manual QA checks. And then the only way you know it's going to work is when you deploy it in production through AB testing. 

You've got to throw a lot of traffic and a lot of volume to see things like conversion rates and dollar amounts converge. And that takes time. And then you got to go back and you got to repeat the process over and over again. And so, it is much more like a scientific method where you do not know necessarily in advance what the solutions are going to be to the problems. And the determining factor of how fast you improve is about how fast you can experiment and how many experiments you can run. And how good your kind of offline testing is. 

And so, it's much harder to kind of give a date. Like, "When's the product going to be ready?" You're like, "Well, it depends. We got to run a whole bunch of tests. And we have hypothesis. But I can't give you an exact date because it's not deterministic."

[0:42:07] SF: Yeah. There's not really good ways to test models today of whether - you could have this hypothesis about how we think it's going to improve it. But you basically need humans to QA it. And as you mentioned, you need to put it in production and run AB testing. It's a little bit like how we used to test software 20 years ago before we had integration tests, and unit tests and all these essentially frameworks that are disposable where we can automate a lot of it and canary rollouts and so forth. We're basically running science experiments. How do you think that impacts from like a cultural perspective? Are the types of people that are working in software development a little bit different than those that need to be working in essentially AI software development? 

[0:42:47] SM: It's definitely culture change. The offline to online results gap is large right. A lot of times, you'll run something offline and you'll say, "Hey, this thing really improves stuff. I've got a lot of confidence. Let's put it online." And then you find out the real world is very different. Because you're training the data and you're testing the data on results that were for the previous model. And then you make a change and you get new results. 

What ends up happening - and there's been a lot of great data that's been reported by companies like Netflix, and Amazon, and Microsoft about half of all experiments fail. Netflix said I think nine out of 10 experiments fail. Suddenly, you're in this position where if you were a software engineer and like nine out of the 10 things that you built failed in production, you would be pretty upset, right? Engineers don't like to fail. They like to be able to analyze a problem come and up with a solution and see it work. 

And so, being much more tolerant of failure, risk-taking and trying to move and experiment at high velocity, it's definitely a slightly different culture. And obviously, all of the tools and foundational things you need to build models, collect data, train them, deploy them are very different. But I found great software engineers adapt pretty quickly. And they're pretty excited about expanding their skill sets and learning new things and growing into the roles that AI has provided. 

[0:44:06] SF: Yeah. I agree. I mean, I think there's a lot of people that are experimenting right now. And I also think that because of the impact that generative AI is going to have on the way that we do work, you're kind of doing yourself a little bit of disservice regardless of the job that you're in if you're not spending a little bit of time of figuring out like how can I actually leverage some of this stuff to do my job better or do my job more efficiently? 

As we start to wrap up, what's next for Algolia? Is there anything that you could share about some of the investments that you're making there? 

[0:44:34] SM: Yeah. Well, I mean, I think we are absolutely committed to building the world's best search engine and making it easily available to everyone. And we're just going through a transformational phase right now with these large language models. The next year or two is us really scaling out and making these language models available to everyone through self-service platforms and really seeing how far we can take the technology. 

We've seen really incredible performance improvements. It's probably been one of the biggest technology jumps that I've ever seen in search in terms of the conversion rate uplifts and the numbers that we're seeing when it's put into production. And I still think we're very, very early. I think the next couple years is really about us rebuilding search from the ground up to use AI in large language models. And it's going to take years for us to figure it out. We're still in the early innings. But we're pretty excited about some of the techniques that we've developed particularly around scale, and speed and cost of deploying LLMs into production. But still a lot of work left to do.

[0:45:36] SF: Yeah. Well, that's good. You have a reason to get up in the morning, right? Well, Sean, thank you so much for being here. I thought this was a fantastic conversation. I really enjoyed it. And I think everything is really early. It's an exciting time to be in technology. I was alive, but I wasn't there in the tech industry at the time of like the.com boom. But I imagine it feels a little bit like that of all the excitement of like, "Hey, this internet is going to change the world."

[0:45:59] SM: Yeah. And I think we're in a very similar curve to the internet, right? Do you remember the huge amount of enthusiasm in the late 90s? I think they predicted that retail stores were all going to close down and we'd have no cities or high streets, right? People extrapolating the growth and potential out way too far way too fast. But the internet has been a profoundly important platform for us. And I think AI will be very similar. I think as Bill Gates said, you overestimate what you can do in a year and underestimate what you can do in 10 years. And I think we're probably at that situation right now.

[0:46:33] SF: Yeah. Absolutely. I totally agree. Well, Sean, thanks so much. And cheers.

[0:46:36] SM: Thank you so much. great to see you.

[END]