EPISODE 1896 [INTRODUCTION] [0:00:00] ANNOUNCER: Engineering teams around the world are building AI-focused applications, or integrating AI features into existing products. The AI development ecosystem is maturing, which is accelerating how quickly these applications can be prototyped. However, taking AI applications to production remains a notoriously complex process. Modern AI stacks demand LLMs, embeddings, vector search, observability, new caching layers, and a constant adaptation as the landscape shifts week-to-week. Increasingly, the data layer has become both the foundation and the bottleneck to AI app productionization. MongoDB has been expanding beyond its core document database into a full AI-ready database platform with integrated capabilities for operational data, search, real-time analytics, and AI-powered data retrieval. The company also recently acquired Voyage AI to provide accurate and cost-effective embedding models and re-rankers to its users. Fred Roma is a veteran engineer and is currently the SVP of Product and Engineering at MongoDB. He joins the show with Kevin Ball to talk about the state of AI application development, the role of vector search and re-ranking, schema evolution, and the LLM era, the Voyage AI acquisition, how data platforms must evolve to keep up with AI's breakneck pace, and more. Kevin Ball, or KBall, is the Vice President of Engineering at Mento and an independent coach for engineers and engineering leaders. He co-founded and served as CTO for two companies, founded the San Diego JavaScript Meetup, and organizes the AI in Action Discussion Group through latent space. Check out the show notes to follow KBall on Twitter or LinkedIn, or visit his website, kball.llc. [INTERVIEW] [0:02:05] KB: Fred, welcome to the show. [0:02:07] FR: Hey, Kevin. Great to be here. Thanks for having me. [0:02:09] KB: Yes. I'm really excited to dig in with you on this. Let's maybe start with a quick background of you and how you got to where you are today at Mongo, and then maybe we can use that as a way in. [0:02:22] FR: Yeah, absolutely. I started as a software developer. It was a long time ago. In France, in Paris. I evolved. I became a manager and worked also on the product side, and I worked in different continents, small startups, big companies. The biggest one was AWS. A small startup. It's probably some startup you haven't heard of, French startup. Yeah, no, no. I would say, most of my career, I've been in the cloud. Even before, we call that cloud, like more application service provider, but plugging some servers and giving access to these servers to customers. Mostly, around security, things like payment, identity encryption, and more lately, at MongoDB on data management and AI. I'm having some fun here. [0:03:05] KB: Yeah. Let's dive straight in there, talking about data management and AI. I think everyone's trying to figure out what is it to build an effective AI application right now with these newest tools. What's your take on the core pieces you need? [0:03:20] FR: Yeah. I mean, it's never been easier to vibe code an application. That's for sure. That's exciting. I mean, I don't know about you, but I just love it. It's going so fast and that's so thrilling. What we see, though, is that when you want to build something that is really for production, be it large customer consumer application, or large enterprise application, it's still very hard. I think we could go to a couple of reasons. The first one when you build, I think there are other blockers later on when you want to launch it in production. The first one when you build is just how complex the stack is now. You need an LLM, you need a vector search, you need a different kind of AI models, you need an AI framework, you need new caching mechanism. That can be a little bit, when you leave the vibe coding piece and you really want to build that in a professional manner, that can be a bit scary for a developer. [0:04:12] KB: Yeah, absolutely. Well, let's talk about the angle that, I think, I understand you all are addressing, which is the data side of it. Because to your point, right, I have my database, I might have to do some embedding, I need a vector search situation. I need all those different pieces. How do you think about the data stack for AI? What needs to be in there? [0:04:35] FR: First, what we think about the data stack, I think there are three things we may want to - You'll tell me which one you want to dive in first. But first one, we think it should be simple and simplified as much as possible. The second one is you need to make sure it's really accurate and cost-effective, because the information retrieval can be pretty expensive, if you don't take care of it. I think the third part is you want to make sure it can evolve quickly. Things are going so fast. You never know which is the best LLM model. You just unplug for two days and there's a new one. You never know which tool you need to use. You never know how fast your data will grow. I think it would be the three things. I mean, we want the stack to be simple, we want the accuracy of the information retrieval to be really, really good. We want to make sure that if you change your mind, like the ecosystem is changing, or things like that, you can touch your application and make it evolve easily. That would be the three key points. [0:05:25] KB: Let's actually talk about that last piece, which is the evolvability piece, because this is one of the things I see a ton as we're doing AI agents internally, the things that I'm working on, schemas are not nearly as durable as they once were. [0:05:39] FR: Yeah. No, absolutely. I mean, they're not durable either, because you changed your mind, or you pivot your application as a developer, and that's totally fine. They're not durable as well, because the ecosystem is changing so fast. When you want to connect to a new partner, or new integration point, so even using a new LLM framework, maybe you will have a new field to account for. Maybe you want to adopt this new observability for LLM to do real good evaluation. Yeah, that is changing super-fast, absolutely. [0:06:12] KB: I think there's even an aspect that I'm starting to see, which is LLM derived schemas, right? Instead of having one mega schema that the developers come to, let the LLM choose it. [0:06:22] FR: No, absolutely. Yeah, it's definitely a trend right now. Plus, when you say, let the LLM choose it, the reality is that you may have to play with several LLMs. You may have to be able to handle several schemas, either again, because the LLM that was best this month is different than the one that was best last month. Or, because for some tasks, you still need some specialization. You still have LLMs. I think that would be more and more for trend. You will have LLMs that are really good at some specific industry, or use cases and others are really good at other things. No, absolutely. It's been MongoDB value proposition forever. You go with the document model, you don't have to stress about any change you will want to do. I mean, we love it, just to be clear. We love that the full AI world is speaking JSON. We love that the full AI world is coming with all of these changes, because then we can say, yeah. I mean, that was already something important before to be able to evolve quickly, but that's even more the case in the AI world. [0:07:21] KB: Let's talk a little bit then about the second step that you talked about in terms of, okay, we want to take this to production, we want to be able to scale, we want to be able to deal with all of these different things. Because I think Mongo has long been, at least in the front-end world where I used to live a lot, database of choice for rapid prototyping. Then at some point, sometimes people would say, "Oh. Well, now we've got to switch over. We've arrived where we're going." I think at this point, you all scale all the way up, yeah? [0:07:47] FR: Oh, yeah. Now we have like, I think it's 75% of the 14,500. I've been at MongoDB for one year and a half. It started far before me. But I can tell you, we always speak about big fall, like security and durability, availability, performance being able to speak to large enterprise. We see that more and more. We see more and more of this very large workload. You can totally manage as a transaction. You can totally manage this very strong transactional, financial things. I mean, MongoDB is the open source. I've been a customer of MongoDB far before even considering joining the company. I also had initially, oh, yes, I remember we are going so fast, but that's very, safe database for scale function. [0:08:30] KB: Let's then talk about how to effectively use it in an AI application. Because I think this is a space where, okay, the model layer is pretty well understood, if changing very, very rapidly, right? LLMs, you can throw text at them, they love JSON, all these different pieces. Then you have this kind of, okay, there's these cool applications being developed, but all those middle pieces, and as you highlight, the complex stack that's going into that is very much in flux. [0:08:57] FR: Absolutely. Yeah, absolutely. Even past what we discussed before, this document model in JSON that is, again, very well adapted, optimized for AI, you still need a search and vector search. Because I mean, you don't have any serious AI applications that will really give you a lot of value if you just plug an LLM. The value of these AI applications is, okay, how do I connect an LLM? What's the LLM knows with what my company knows? If you want to do that, you need a search usually and a vector search, by the way. Maybe we can come back to that as well. But if you are looking for, I don't know, you are building an optimized e-commerce website and you are looking for red shoes, you probably want to see these burgundy sneakers as well. You need search, you need vector search, you need AI models and bidding models and re-ranking models. Because this is how you can really have very good information retrieval and really make sure that if you are building your RAG application, if you're semantic search, your agentic system to make sure that the right information is provided, granted results are providing to your user. Depending on what you are building, you may need some stream processing, if you have some events. You may do many of these things. As a customer, you can choose to stitch together different solution, a database, a vector search, a search, a re-ranker and bidding model, etc. I don't think you should do that, but I don't think you should tell your best friends to do that. You could stitch these things together and connect multiple times your identity providers and create this pipeline for the data to transfer. Yeah. What we are really betting on and what we see is bringing value to customers. It just makes it super simple. You have a database, we don't train on the world with all the stack you need for AI, but on the data layer, you have a database that is becoming a data platform. You can do, yes, storing of your information, curving of your information, but also information retrieval and data in motion and all of these AI model optimizations to make sure that you will get good results. [0:10:53] KB: Let's go a little bit deeper in what that takes on search. Because I love that you brought up search. I feel like, to me, one of the things that I'm seeing is everybody's thinking of these as chat products. They're not chat products. They're search products at their core. They're about servicing the correct information and LLMs help you interpret that and put it into context for someone. It's not as simple as throw it at an open source embedding model, or OpenAI's embedding model, run naive queries and just go. There's a lot of pieces that go into effective search. [0:11:23] FR: Absolutely, absolutely. We can maybe take an example, a concrete example, a simple one. Let's say, you are a bank and you are building an application for your customer support. Your bank's customers, they will maybe be able to ask questions about their account and their credit card, etc. Obviously, you need an LLM, because the full interaction, the conversation is an LLM interaction. You need that for sure. But if you only have that and you don't have access to the internal documents of your bank, probably a bit weak as an experience. You need search. You need search. I would say, you need both. You need probably vector search that already is a semantic search. Because if I'm asking you, okay, how much money do I have on my account? Maybe you will look at it from that, maybe, how much money, but what is the sold my account? Very different words, in very different languages. You will need search, vector search. You will probably need as well, maybe some reaction to that event, because if you see that something has been paid in the last two minutes and we are discussing for five, you want to take that into account. You need all of these pieces to come together to search a vector search in the stream processing. [0:12:34] KB: Let's talk about the pieces that go into vector search. Because, I think, once again, for folks who are coming into this, to the beginning, they think, okay, what do I need to know, right? I mean, I started - the first interactions I had with search were back in the days of Solr and Lucene, and these things. No vector search. It was all keyword based, but you had some synonyms and things like that. To me as an end user, I was like, okay, throw it all at Solr, do a couple configurations and I'm good. It's golden. It serves things up. I think there's more nuance than that. If you just throw all of your documents at OpenAIs embedding model and assume it's going to work, it's not going to just work. What are the different pieces that go into that? [0:13:16] FR: Yes, there's a couple of different pieces. I will start with the basic. Different models will have different accuracy. Quality of the results. It's always a tradeoff between how fast you want the results and how good you want these results to be. The best embedding models, and that's exactly why we acquired Voyager a bit more than six months ago now, I mean, they were doing and they're still doing the best embedding models as they are more accurate and this OpenAI that you mentioned. You actually want the best result with a reasonable latency, because you have a user, probably, maybe behind a chatbot, or maybe behind an agentic system application waiting for results. Accuracy is a big one. Multimodal is a big one as well. Most embedding models will be able to do a good job comparing text, or maybe pictures, but the real world is messy. You have images and text and videos combined and you will have PDF and you probably don't want as a developer to break that things down and call different models. That's also one, like what does the format that you can support? I would mention two more things. Yeah, accuracy, multimodal. The ability to understand context is also very important. For instance, if I'm asking you, okay, let's say, you have this support application, chatbots, maybe now you are like a networking company and I say, okay, how do I configure this router? If you find somewhere in your corpus of data an exact sentence, or blurb or text that explain how to configure that router, most embedding models will be super happy. Oh, I found the information. The best embedding model will be able to say, "Well, wait a minute. I found a line, a sentence that looks exactly what the user is asking for, but is it part of the recent documentation? Or is it part of a ticket that is five-years-old and the setting is totally outdated?" The context is very important. Last, and that's last in my list, but probably sometimes first in the customer's mind is the cost of it. As people sometimes don't realize that these embeddings, they can be even bigger than the data that they represent. If you have a great model that is able to do all of that and you don't have many of them, good accuracy and multimodal and good context, but the embeddings are so big to be able to achieve these results, it will just be an awful error for your application. You also want a model that is able to do that with really short embeddings as that would be cheaper to store and cheaper to query. [0:15:41] KB: Yeah, so that's interesting. I want to explore a few different of those aspects, and maybe we can explore them from the context of Voyage, because that is the recent acquisition. [0:15:49] FR: Absolutely. Yeah, yeah. [0:15:51] KB: That was, as I understand it, the secret sauce there. First, starting with this multimodal piece, right? Because if I think about an application I'm building, I am probably doing a fair amount of preprocessing, right? I'm like, oh, this is an image. I got to translate this image to text. Now I've got to send this text over to my embedding model. Now, I've got to take that and do my vector search. It's like, this whole pipeline of things. [0:16:13] FR: Absolutely. [0:16:14] KB: It sounds like, what you're saying is they've got a multimodal model that you can just throw whatever at it, and it's going to translate it. [0:16:21] FR: Yeah. No, you nailed it. That's exactly what customers were doing. When we speak to customers and they are describing that, they say, "Oh, I have my -" Let's go back to the PDF example. It's like, "Oh, I have all of this pipeline, and I will extract my pictures and my text from my PDF. Then I will run them through different embedding models. I will try to reconcile the results afterwards." Yeah, with a variety of multimodal model, you just throw your PDF in the embedding models, and you will have an embedding. By the way, the result will be even better than when you are doing also pipeline, because when you are breaking your document down, you will lose some context, you will lose some interaction. Where was this picture exactly? Was it above this text, or below this text, kind of thing? The result would be better, but the big benefit is also as a developer, you can go super-fast. Just use your document in your embedding model and you're done. [0:17:10] KB: Yeah. No, the simplicity definitely appeals to me. Let's explore the context piece a little bit more, because what I hear you describing is your embedding right now is taking into account not just the text, or not just in the case of the PDF, like text and images, but it sounded like, things like metadata, updated timestamps, all these different things. What does this API call look like? What do I pass to it? [0:17:32] FR: Yeah. Or it's even more than that. What you describe, by the way, is the fact that you also want to take into account the metadata and some other information in addition to purely semantic search. It's super important, top-of-mind for customers, and that's why they are using, by the way, vector search and search combined. That's why it's so important for search and vector search to be where your operational data is. If you're using a separate vector search, you will have to, oh, what is all this data, or metadata you have to also synchronize to this. No. When your database is there, you can do all these stuff that you're mentioning. The context is a bit different. It's part of the embedding model. It's just where the model is trained. Then, when at inference time, instead of just isolating a chunk of text from your document, that's how you, again, maybe I should step back. When you are running an embedding model on a text, you should get - you don't give it two pages of document. You are chunking the document in small sentences, or blurs, or things like that. We call that chunks. Then they will have an embedding for each of this chunk, but they don't really know what is a chunk before and what's the chunk after. What the Voyage model is doing, and the Voyage context model, the specific one that we will release, is that it will pass the full document, and that it will preserve some context in addition to the specific chunk. Yes, you will know, for instance, here, this sentence really explains how you can configure this router, the security configuration, maybe. That's how you know that you are part of an old ticket, because you also see maybe three or four chunks above that it looks like a super ticket, and that it is 60-years-old, and that probably, you shouldn't give it too much importance. [0:19:07] KB: Got it. Conceptually, if I were to just try to map this out, if I were building this with a much more naive model, it would look something like, okay, I have a summary of the whole document with maybe some additional things, and then I have each chunk, and then those two things are getting put together kind of wings in each set. [0:19:23] FR: Exactly. [0:19:24] KB: Interesting. [0:19:25] FR: Yeah, that's a great way to look at it. Yeah. [0:19:28] KB: Fascinating. Well, and you alluded to another piece of this, which is combining search, and that gets us into this topic of re-ranking and all of that. Can you maybe lay out what that looks like, just broadly for context for folks who haven't built these applications before, and then what the Voyage take is on it? [0:19:44] FR: Yes. It really depends about what you are trying to achieve in your application. But what we see most of the time, when you want to have the best results, and I'll give you one example, but when you want to have the best result, the most accurate result, like your users is asking for something through a chatbot, through an agent, etc., and you want to give the best result, combining keyword search, like looking for a document with the exact keywords that we are part of the query, plus also the semantic search, meaning that documents that may have very different keywords, but are speaking about the same topic, combining the two is how you get the best results. The example would be, let's say, you say, I want to - I'm interested in a - I'll go back to my red shoes. I'm looking for Nike red shoes. Maybe red shoes is totally okay to go with burgundy sneakers, because that's almost the same. Now, if as a user, you made the effort to mention Nike, it may be very important to you, and you want really to make sure that you are looking at the keyword, Nike. Let's look at the keyword Nike, but let's only look at the semantic meaning of all of these red shoes and maybe burgundy sneakers are perfectly fine. This is just a very simple example, and it doesn't fit to all use cases, but most of them, the best accuracy would be to combine both of them. That's why having search and vector search and the database at the same place, it's a big deal in terms of approved - [0:21:06] KB: Yeah. You remove a lot of round trips to do that. [0:21:09] FR: Absolutely. Yeah, you're right. Sorry. That's a very important point that you're touching. I'm not saying that you couldn't do it with multiple pieces. You could. But then you have to run your search on the keyword search. You have to run your semantic search, and you have to build your own algorithm to see how you are ranking those results. Yes. That's exactly a lot of hurdles that you are removing. [0:21:28] KB: Implementation-wise, if I then was using your API, am I able to specify program, run this against these two searches, re-rank in this way? What are the knobs that I have available as a developer? [0:21:41] FR: Yeah. We didn't reinvent the wheel, by the way, with a nicer search or vector search. You use a MongoDB aggregation pipeline, the one you are using with your database to just query data. But we created new operators, score fusion, rank fusion. I'm not going to this detail, because these are just slightly different ways to merge the results. But you have full control about, so first, you have one operator where you can combine keyword search and vector search, but you have full control about how you want to do that. I mean, many customers are just happy with the basic way to combine. But some say, okay, I want to have different weight. I want to over index a little bit, maybe in this keyword and a bit less on these ones. It's up to you. You just use MongoDB aggregation pipeline. [0:22:26] KB: Maybe it's worth actually stepping back and talking about that aggregation pipeline a little bit, because that is a capability that doesn't exist in all databases. [0:22:34] FR: Yeah, absolutely. Yeah, yeah. No, no, that frees - I mean, if you have used MongoDB, that's something that customers usually really like is it really gives you the ability to make several operations on your database, one after the other, and the result of an operation can be used as an input for the next operation. It can be really, really powerful. That's the case for this against search. You can use combined operators as I was describing, but you can also decide to do some search and then you will do some reranking somewhere else and you will do some of the stuff. It's really a pipeline that you can implement to play with your data. [0:23:10] KB: Yeah. Just to echo back, right, if you were using another database, you might have an external pipeline tool, where you're defining a series of stages with dependencies and data transfer and moving that around. With the aggregation pipeline, you could do that all inside of the database. [0:23:25] FR: You can do that all inside the database, because this is natively integrated, like search, vector search are natively integrated with the database. You don't have to move the data around. You don't have to use different aggregation pipeline indeed, but you don't have to use different CLI. You can really have all of that as a single experience. It's not like an extension where you have to be careful about how that will be supported, or how you can plug things together that the same tool is giving you access to all of this. [0:23:49] KB: Just so that we're clear, those things can be defined on the fly. If someone, for example, was giving the LLM tools to the kingdom, it could write its own aggregation pipeline and run it? [0:24:00] FR: The developer. Oh. So, yeah. We did release an MCP server. That's really trendy these days. It actually is really effective as well. Customers are, means the adoption is pretty nice, if you use this MCP server as an example. I'm mentioning MCP server, because you mentioned LLM and usually, that's how developers more and more are going with interaction with our database. You can absolutely for sure create clusters and do operation, but also configure your aggregation pipelines. Yeah. [0:24:27] KB: Let's talk a little bit about security. You mentioned security as an area that you had dealt with. I think this question of what are LLMs allowed to see? What are they not allowed to see? All of this is definitely top of mind for those of us building applications here. What are the primitives that are, I'm guessing, baked into the database to allow you to build secure AI applications on top of Mongo? [0:24:51] FR: I would say, I want to step back on the overarching principle, because you touched on LLM and what an LLM can see. That's exactly because you don't want, probably, to train an LLM on your private data. That's the pattern, the architecture that is winning out there is, no. I will not. I mean, there are exceptions, but I will not fine-tune, or post-train my LLM on my data. I will use an LLM, really good at, again, everything they are great at. For my use case, when a user wants something, then I will connect where this LLM knows with what my company, or my application knows. That's not about giving anything to the LLM. It's about your application being able and with very good information retrieval. Say, okay, my user is asking me again what is my credit limits on my credit card. A lot of things can be handled by LLM in terms of how to answer to a user asking question. If I really want this information, I also have to go and find my private information of my bank about what is a real limit, and I will provide this information to my user combined. But the LLM, we'll never see this private information. Maybe just to clarify the overall pattern before we enter the security details. That's really important. [0:26:04] KB: I think that is very important, in terms of what sets of data are - I think the term I sometimes use is moderated through the LLM. Even if the LLM, the LLM may be the UX delivery, but do I load this data, pass it to my model, and have it presented? Or do I sidestep around the LLM, because this has to be right and I can't count on it not hallucinating something about it? [0:26:28] FR: Yeah, yeah. Absolutely. There are different patterns there. Most of the time, what customers will do is, again, if a user and agent want to do something and that does require your information retrieval, you will first look for the information that will be relevant. I'll stick with the same example. What is the internal document that explains what are the limits on credit cards, and then we'll insert that in the prompt of the LLM. Yeah, that will go through the LLM from the prompt and the answer of the LLM, but it's not stored anywhere on the LLM side. It's not even part of the training of the LLM. [0:27:00] KB: Sure. Yeah. [0:27:01] FR: If you decide, then that's up to you as a customer, where do you want this LLM to be hosted and where do you want this query and this token to be served? You can decide, depending on your security sensitivities, many customers say, okay, I'm totally fine with adding my LLM on AWS, or OpenAI, or Azure, etc. Or some customers say, "No, I want to do that, but I want some specific security agreement with these providers to make sure that my data is never shared." Some customers say, "You know what? I want to host my own LLM." You can do that if you want. What is important is that there is nowhere the blending, if you want, of how the LLM is trained and your private information. [0:27:42] KB: Yeah. Coming back, though, to building applications with these pieces, I think I'm curious to understand how you're seeing people defining these lines, or barriers. Is it changing at all in terms of how you're managing security at the data layer? [0:27:59] FR: I mean, security has always been a very - I mean, customers trust us with the data, as a data platform. It's always been top of mind, anyhow. I would say, that with this AI specific application, we see more and more, I would say, at least even more discussions about this LLM integration, the one you spoke about before. One of the big value of MongoDB is you can run it anywhere. When you say run anywhere, sometimes people tell us, "Oh, you are cloud agnostic, and you can run it on AWS and GCP and Azure and etc." Yes, that's true. We can also run it on premise. We have an enterprise and you can do that in your own data center. We see customers that are doing that. What is very interesting, I think, in this AI world, you see customers that are saying, "You know what? For this use case, I'm totally fine if it's in the cloud. But for this use case, I really want to make sure that my data is - [0:28:51] KB: Oh, that's interesting. [0:28:52] FR: - never in any cloud provider and never touched by any LLM provider either. So, I will run it on-prem." You start to have this. Again, I think it's so early and things may evolve, etc. The fact that they have a choice and they can decide what to rely on as a cloud provider, as their own data center, I think is pretty interesting. I do believe it's more and more a discussion topic. [0:29:16] KB: That's fascinating. When you're seeing that, are you seeing them doing this within the context of the same application? So, you're having to have security boundaries and federation, and however that's working? Or like, how does that work? [0:29:29] FR: I'm sure I can extract one pattern, one answer to that. You have customers that will say, okay, I will really have - all of my data will be on-prem, and then some application will be in the cloud and some want. I see some customers saying, "You know what? I want my data to be in my data center." I want MongoDB enterprise advance, and that's how I manage it. But I'm still okay to do a call to an LLM outside of my boundaries, because they believe that they can control and they are right in many cases. If they can control the prompt, so it's okay to send some information as soon as it's something that your application control, but at least nobody seeing the raw data. I really see different pattern of them. I wouldn't be able to tell which one wins. I think what is very important, though, is that overall, I think there is really this intent to remain as flexible as possible. I think, say, I'm back to the previous point. Maybe this LLM is good for me right now, but in six months, that would be another one. Maybe this cloud hosting is good for me right now, but then I will want to go on print for any regulation, or specific concern later on. I think there is really a willingness to remain as flexible as possible and to have options on the table, I would say. [0:30:36] KB: That gets to a somewhat different topic. I think one of the things that these tools are doing is they're changing the speed at which people are operating. [0:30:46] FR: Absolutely. [0:30:47] KB: They're changing how fast we're moving. They're changing how adaptable we need to be. How are you both internally and with your customers, rethinking the way that we organize the teams doing this work? [0:30:59] FR: Oh, yeah, yeah. Okay. I mean, there is a product angle. I thought we were initially going through the product angle and - [0:31:04] KB: We can go there. [0:31:05] FR: Super quickly, but I want to go to the team organization. I think that's super important as well. From a product, I'll go back to, if you want to go fast, but you have to move your data around and to - I mean, any of us who were being software developers, or architect at one point know that when you have to optimize for latency performance, iterate quickly with network, layers, and data to transfer, it's just a nightmare. That's one of the key arguments, not just for MongoDB. I think that's a big value of MongoDB. Overall, having the database vector search and search at the same place and you don't have to do this ETL as this difficult network configuration between, I think that's a big one for your question as well about the speed of development and iterating. Now, if your question is going more to the team organization, I think that's a very good question as well. Even so, I mean, two things. I think what's happening right now is that it can be also very thrilling when you are a product manager, or business person and say, "Oh, I can build it myself. I can build fast." Personally, there's something I love about that. What I love about that is that instead of trying to debate about maybe some text and some points you can really show. I love that. I think the risk is to believe that it's easy. Yes, it's easy to show something. That's the old engineering and product manager. Sometimes they have to align on that. I think it's great if you use that well. If you use that as a way to say, oh, and just put it to production, well, for some stuff, yes. But how will you scale? Because that's all of these tools, right? They are making the code. Writing code should be fast. Well, reviewing code is not faster. Making your security assessment of this code is not faster. Defining the right architecture is not faster yet. I think it can be awesome if you know what it is. It can be a bit dangerous if you believe that then, oh, then that's it. I can just push it. I don't know if I maybe pivoted a little bit to your question, but that just made me think of this point. [0:33:00] KB: No, I think it is key. It gets to a couple of different questions, some of them related to the product piece and some not. One of these things is going from zero to a prototype I can show is now very fast. [0:33:15] FR: Absolutely. [0:33:17] KB: If you put some restrictions in place, some of things are then, if I was hearing you correctly, you can then actually take out to production. Some things actually do, are able to ship relatively easily, but others are not. I guess, the place I would start to ask you is like, how do you draw those lines? Are there ways in which having, for example, a database that can handle all these different pieces together makes it easier to bridge that gap? [0:33:44] FR: Yeah. I will obviously be biased, because of working at MongoDB right now, database. We are not vibe coding a database. It's too much at stake. [0:33:52] KB: You're not? [0:33:54] FR: No, we are not. We are not vibe coding. Now, that doesn't mean that we are not leveraging AI for many things. For prototypes, 100%. For this product and engineering alignment, by the way, I didn't answer your question about team organization, but maybe later. We can put a pin on that. Also, how you handle tickets, how you onboard. I think people, even very senior engineers, when you ask them, say, "Oh, I have to discover this new part of my codebase that I haven't touched in a while." I can go so much faster on that to understand. I think there are many, many benefits as well for real product, but the code that is written for production, I think that we are still doing it very manually for the code database, for sure. Then I would say, even for more internal things, or stuff that are maybe a bit less sensitive well, you can, for sure, go faster in your - I don't know if vibe coding is the right thing, but AI assisted coding. I still believe, like yeah, the security audit, these observabilities, there are still a lot. It's not truly yet, for this. It was in my opinion, but they can help you prototype and align on tickets and align on requirement. I think that's pretty impressive. [0:34:58] KB: Do you have a line in your product of like, within this must be handwritten outside of this AI, is just okay? [0:35:07] FR: Yeah. All of the core database and core product of MongoDB, we don't vibe code it. We don't AI it. The code is written manually each time by a developer, and we are doing the code review as we did, and we are doing the security review as we did, so that is for sure. Yeah. We can use, again, some AI tools for some security finding things. I mean, we can get some help, as many companies do. But I would say, most of what we experience with AI are a lot of more management tool around that, that's still a lot for all the management tool around that, also internal tool, also before coding, and after coding piece to go faster, we use AI more and more, but we don't vibe good as a core MongoDB database for sure. [0:35:48] KB: That's obviously probably pretty different than a very new, not so security focused startup. Thinking about that, has it changed anything about how you're internally organizing your teams, or does it still take just as many people, because the core has to be so solid and locked down? [0:36:06] FR: Yeah, yeah. No, that's a great one. I would say, so there's something I'll first, as maybe as a cultural stuff that I came to really enjoy about MongoDB. I saw that in my previous company at AWS as well, is that even people that are not engineers are pretty deep technically. We have product manager trying to build their MCP experimentation. I think that's culturally speaking, that's the case. You can see that, because you have people moving from engineering management to product management role easily. I think I do. I just want to see that. Is that in this context, because I think that if you're in a different context, which can have also pros and cons, but I think maybe that's different. I think in that case, it can really fast-track the alignment on what you want to build. Because the example you took before, instead of just describing the long document, or that's the way I want, you can just do it. Then you can align. Then you can discuss about how to do that. Yes, it does fast-track the alignment between product and engineering, no doubt. We went pretty fast in terms of organization. We made the decision a few months ago now, too. We don't even have an engineering and a product organization anymore. We have a product and technology organization. Part of it is not just because of AI, but that was part of it. I think two big objective, one, how to make sure we can fast-track the decision-making, the alignment, the sharing of information between the product decision and the engineering decision, because they are the same, eventually. That's number one. Number two, how to make sure that everyone is customer obsessed? Even if you're an engineer, you should be customer obsessed. I do believe AI helps with that, by the way. You can really see what your product measures in mind. We are using that to really show some report from customer discussion at [inaudible 0:37:47], these kind of things. We really made this leap, because you are touching on the organization. I do believe that, I guess, we probably would have done that as well. But with these AI tools and where to collaborate, I think it's helping product engineering to be in the same smaller teams and before it was a product and engineering organizations. [0:38:04] KB: Yeah. No, I think there is definitely something in it. I've been in a lot of conversations talking about this convergence between engineering and product, whether that looks at more technically-minded product people, whether that looks at more product-minded technology people. As code becomes maybe not in a database core, but in many contexts, there's more commodity, it means that product mindset is more and more important. [0:38:27] FR: I think that's spot on. I would have a bit of a near-end stake on this one, which is I love that product people can be a bit more engineering, or a bit more product, or the wording you use. I love that, because, again, I think that's a great way to be customer obsessed and to be focused on the outcome and what you are trying to achieve more than just trying to articulate what you're even thinking. However, the expertise doesn't go away. If your product person and you are meeting many, many customers a week, you will have an expertise about reading between the lines and understanding the reading the room in the meeting. If you're an engineering person, it's not just about this prototype that your product manager colleague can build. It's really about, how will that scale? How will that evolve? What do we expect my growth to be and my new changes to be and my next security audit to require? I think, yes, it's great to bring people a bit closer to the other side somehow and to brackets, but I do believe we should respect expertise. There are still a lot of expertise in what it is to build a system at scale for real production usage and what it is to really understand customer intimacy and what is the need of a market. We should respect that. We shouldn't believe that because we have tools that are helping a little bit, this expertise doesn't matter. [0:39:42] KB: Yes, I think that that is - if I were to summarize, one of my big lessons about LLMs is they're incredible tools, but you cannot turn your brain off. Your brain as an expert is super necessary still. [0:39:56] FR: Oh, I love that. To me, I don't remember who said that. I would love to have been smart enough to say, but someone says something like, oh, I don't understand. My LLM is super smart and also topic I don't know. But when it's a topic I really know, it's not that smart. I love that. Because your expertise is still important. When you know deeply a topic, you do realize that the LLM is wrong sometimes, and you do realize that I've drawn the exact reason back to the - I will take that back to the MongoDB case, about crawling that with real data, real knowledge is important. But even overall, I mean, expertise matters. [0:40:32] KB: Even within using it, I find I'm better able to guide these tools in areas I know well, than in areas I don't. [0:40:40] FR: Yeah, I love that. [0:40:43] KB: Coming back a little bit to this piece around the data layer under LLM applications, what do you see as the big unsolved problems? What are the things that your team is working on looking forward for the next, I don't know how long we're allowed to project in the AI era, two weeks? Six months? Something like that, right? [0:41:02] FR: Yeah, maybe. No, I think everyone out there is developing an AI application, right? I mean, it's pretty rare to see a company that, but you don't have so many - I think we are at this turning point right now when we are really going to production. I mean, you have this, it was a famous MIT paper three or four months ago saying, oh, 95% of these AI applications, they don't make it to production, or when they make it to production, that disappoints. They don't bring the ROIs they were expecting to bring. I do believe it's changing. I do believe more and more of these AI applications are getting production ready. To me, the key to your question about what we see, I believe that people, customers really understand more and more how the quality of the granted response is how the quality of the information retrieval to make sure that your LLM will not just be LLM smart, it will be company smart. It will know what the company knows. I think the accuracy for use case, just being a 2% or 3% more accurate, that means that you are totally reducing hallucinations. Because people are really realizing that even a little bit of impact on the hallucination is a big deal for the user experience. I do believe as well that people realize that, well, actually this stuff are expensive. This AI model can be pretty expensive. Even right now, well they are subsidized by a lot of VC money and all of that. That's still expensive. If you are able to only call the LLM when you need to call the LLM, and when you're able to optimize the length of your prompt, because you did a good job at finding the relevant information in your core piece of data before, if you are using embedding and ranking model that are pretty short and cost efficient, that can totally change the ROI of an AI application. Accuracy and cost of this AI application, I think that would be a big topic for the years to come, in my opinion. [0:42:52] KB: In terms of accuracy, then, I mean, I think you've talked some about - we've talked about what it takes in terms of searching, in terms of reranking and what you surface. Are there any other best practices you've seen, or you recommend to folks in terms of what you're actually putting in that? Are you doing pre-processing? How are you navigating those different levers? [0:43:16] FR: Yeah, absolutely. I mean, with that, I mean, that could be a very long discussion, because that's - I would say, one, the quality of your data. If you don't clean your data regularly, handle metadata, know what's important, the quality of your data is key. How you are preparing your data for AI, and we spoke about chunking. How do you cut your long text into smaller text that makes sense? I think there is a science, there is a science to an art to that. How do you prepare your data for? It has to be clean. It has to be prepared for AI. Then that's really the, what is the right information retrieval strategy for you? Do you want results that would be super-fast? Do you want results that would be super accurate? Because you are a legal company. When you are providing advice for financial companies that can impact the tax return, or legal document. Then you will want to use the best embedding models with all the contacts and all the lengths, etc., and that's okay if they are more expensive. Or maybe if you are an e-commerce company, you will want to go super-fast and make sure that you have 10 or 12 good results and it doesn't have to be the only one. I think it's about your strategy about what is the right trader for you between this usual quantity and cost and speed and all of that. [0:44:27] KB: Yeah. The interactivity is an interesting one. Even within an application, you might have different workflows. I was talking to someone who was doing an agent and they were saying, "Yeah, when I know that the user is right there, I bias towards interactivity and speed and getting it up in front of them. If it's an async workflow, now I care more about accuracy. Now I care more about, I can take my time." [0:44:49] FR: Oh, absolutely. We see that, an example, now to public reference, but we have an e-commerce customer of us. How you make your tradeoff for your e-commerce piece, like where the user will go to find direction without finding the sneakers, and how you will manage your stock behind the scene. These are different requirements and latencies and costs and things like that. Absolutely, depending on the workload, you will have a different sensitivity. [0:45:15] KB: I think a lot of our listeners are now familiar with in some of the AI coding tools. You can dial up your budget for, oh, I want you to think longer. I want you to reason more. I want the mini versus high versus whatever. What are the equivalent knobs that you have in the embedding models and the re-ranker and all these different parts that you have with Voyage? [0:45:35] FR: Oh, that's a great, great one. I didn't think about this parallel before, by the way, so I love it, the syncing and the one with LLM. You can definitely go fast and cheaper with a - I would say, basic, but even the basic, you can have really good ones versus that's a good one, but a basic text embedding model. I mean, one of the value of Voyage model is it come with different sizes. You can decide how long your embeddings will be and even the type. If you want to go with a flow, or to a binary. You can decide how much. There is definitely a first decision there about, of course, there is accuracy tradeoff. Then you can discuss whether we discuss multimodal and context. These are heavier models. They can take a bit more time, a bit more compute, but they will give you better results. Is your use case worth this investment? Last, we didn't speak much about re-ranking, but re-ranking is an additional layer that more and more customers are using, which is like the embedding model will basically give you, oh, this are the 10, 20, 100 best documents in your corpus of data for this query. That's pretty fast and you optimize for that. If you want to know the best model, sorry, the best document for this query, then it has to be compute-intensive. You have to go beyond the embedding. You have to go back to the document itself, and that's what re-rankers are doing. That's also another layer of your thinking versus a CCLM analogy, which I like. You can also decide whether you need a re-ranker to have a very optimized ranking of your results. [0:47:09] KB: Now, you mentioned for a lot of these in MongoDB, you would put them as an aggregation pipeline. I'm thinking about use cases that I've had in building these things. Oftentimes, I'll do things in layers where I'll show them something quick fast, but then I might re-do behind the scenes, okay, I'm going to re-rank, I'm going to resurface this, I'm going to bump this, I'll do things like that. If I were to do that in your system, can I get those intermediate results streamed out to me in some way, or how does that end up working? [0:47:38] FR: Yeah, you can have - Well, what you described is in a single query, you could first go with a quick search and a bit of a longer one. I don't have many use cases in mind doing that. What I have, though, is a developer that will start maybe the first iteration, they will go with an embedding model, and that's about it. Then when you really want to go to production, and you will have real users and real data, etc., then they will add, they will upgrade their model to a more powerful one to improve the accuracy of the result, or they will add a re-ranker. Thinking actually, what you describe is totally possible. You could totally do a first search, and then do re-ranking in parallel. For instance, as an example, I haven't seen it, but it's such a fresh space that it doesn't - Like the 80s still. You could still show all these other 10 pair of shoes that are looking like your red shoe, the Nike thing, and you show them immediately. After a few seconds, you can re-rank them to make sure that the one that is very accurate comes to the top. You could totally build something. The technology doesn't prevent you to build something dynamically. I don't know if the financial aspect, or balance is there. [0:48:43] KB: If it'll make sense. Yeah, it'll depend for sure on the application involved. [0:48:47] FR: Absolutely. [0:48:48] KB: Yeah, I think these types of latency trade-offs are all over the place in these types of applications. Then, there is this question of, okay, how much value is there in showing something to the user, versus getting the right answer in front of them, and maybe there's value in each? [0:49:04] FR: Yeah, absolutely, absolutely. At least, what's important, I think, is to have options. Whether you use that in the same flow as you were describing, or you use that because you have different phases of your project. At one point, you just want to improve accuracy and etc. Or you do that because for specific application, or workload, accuracy is so important and for another kind of workload. Maybe you have a free tier. You're okay that your customers have good results. But when they're paying and you have a premium tier, you want your customers to have excellent results. I think having the freedom of doing that easily, because you can change your schema, you can change your AM model, you can optimize, I think, is what customers are looking for, right? [0:49:44] KB: Yeah. Having that flexibility within the same API, same interaction, I don't have to. Now, I have to go and get a different thing. No, that's definitely - [0:49:51] FR: Same document model. I'm coming back to that. We didn't invent anything new to store the embeddings, for instance. They are part of your document model. They are there. That works. We didn't reinvent the wheel about - because what people love about MongoDB usually is a horizontal scaling and the fact that you can have these shards and replication. We did the same for search and vectors. You can have your own search nodes and you can decide that they will be a bit more memory intensive and they will not impact your database. Just using the same principles of the document model, of the distributed architecture, just apply to these embeddings as you mentioned. [0:50:25] KB: Awesome. Well, we're coming close to the end of our time. Is there anything we haven't talked about that you think would be important to discuss before we wrap? [0:50:32] FR: No. I think we touched on it, but I just want maybe to double down on it is things are changing so fast. The quantity of data is changing so fast. It's not just even humans now generating data and consuming data, it's agents generating data. The quantity of data is changing so fast. The ecosystem is changing so fast. There are new players and some of them are amazing, and some of them looks amazing, but won't be here in six months. If we go super-fast, the LLM race is - I love it, by the way. I love when, oh, Google is coming with this great one and then OpenAI. But it's changing so fast. Your sensitivity to, oh, that should be in this cloud provider and that should be on-prem is changing so fast. I think it's super important to go with the data platform that can handle this flexibility. That if something changed, you can change. You don't have to, oh, I have to rebuild this data pipeline. I need to change my schema. I have to integrate this new identity system to this new player. I think it's really important to at least over index on flexibility. [0:51:33] KB: I love it. Let's wrap there. [0:51:34] FR: Awesome. Thank you, Kevin. [END]