EPISODE 1723 [INTRODUCTION] [00:00:00] ANNOUNCER: Chroma is an open-source AI application database. Anton Troynikov is a Founder at Chroma. He has a background in computer vision and previously worked at Meta. In this episode, Anton speaks with Sean Falconer about Chroma and the goal of building the memory and storage subsystem for the new computing primitive that AI models represent. This episode is hosted by Sean Falconer. Check the show notes for more information on Sean's work and where to find him. [INTERVIEW] [00:00:39] SF: Anton, welcome to the show. [00:00:41] AT: Thanks for having me, Sean. Looking forward. [00:00:42] SF: Yes. Thanks so much for being here. You are the founder of Chroma. [00:00:46] AT: Yeah. One of two founders. [00:00:47] SF: Yeah. I'd like to start there. What possessed you to essentially you and your co-founders to start a company? [00:00:53] AT: Look. I think that's a really good question. Jeff and I have known each other for many years. And I think probably like 70% of the conversations we've ever had have been around founding a company. And it's like what companies are even worth founding? Why do a startup? And all these things we're very aligned on a lot of those things. And so, when we got the opportunity to work together, we really took it. And the other sort of pillar that kind of convinced us to start a company together was we both had a lot of very similar experiences in engineering around applied AI in a variety of domains. For example, I spent seven years in machine perception, which is a domain of applied AI where you get computers to understand the world. And we learned a lot of lessons there about building actual engineering systems around AI and around machine learning. And we basically founded the company on the principle that there wasn't any good engineering principles. And we could build good software to actually support teams building and deploying these things. And, obviously, Chroma has evolved quite a bit since we founded the company as we've tried different things. But that was the core mission. The core mission's always been to turn working with AI into something that can be used in an engineering process instead of the sort of fingerpainting alchemy that it tends to be. [00:02:00] SF: Was that originally like you kind of started with this kind of broad sense, like, "Hey, we want to build good engineering for Applied AI?" And then that led you to building - essentially, we'll probably get into some of the details of Chroma. But a vector database. Or did you sort of realize from the beginning that you might end up down that path? [00:02:19] AT: Yeah. It's a really interesting story. In a way, it's kind of a classic software startup story where sort of the original premise of what we were building was a system that could algorithmically tell you which data you needed to label in order to best improve your machine learning model's performance on the next training run. Right? And it's funny because there are now a couple of companies in that space that are doing pretty well as well on that premise as the need for that kind of training has become greater. But the way that we did it algorithmically was essentially to look at the data in the latent space of the model itself. It's a vector representation. And we needed a system that could store, manage, and perform computations in that latent space over vectors. And we sort of went out into the market and we didn't find anything we liked that was, first of all and most importantly, easy to deploy into our users' machines. Because you can't really ask your users to install some third-party service to even try your product. It's never going to happen. We needed something that was super easy to deploy. We needed something that would allow us to iterate very very quickly with what we were building. And we needed something that we could tune to sort of the really the specific performance characteristics we needed to get this product out the door. And so, we didn't find anything we needed in the market. And so, we kind of built our own backend for that product. And sometime in late November '22, we had the realization that this backend that we had built - and so, what started happening then was basically people started building on top of LLMs. People started building like AI applications for the very first time around then. And we realized this backend that we had built basically supported not only the product that we had built, but a lot of other things that people were trying to build in a form factor that would be very useful to them. And so, that's what caused us to really start focusing on this vector piece over the full AI/ML stack. [00:03:56] SF: And how do you think about your differentiators in comparison to some of the other companies that are working in the vector database space? [00:04:03] AT: I think the main thing for us is we're very monomaniacally focused on this AI application's use case. That's really all we care about. And it's what we build for. And we build for it really from the ground up. And it's kind of nuanced. Not least. Because this is a very young ecosystem. And everyone's kind of figuring out what the right thing is right now. And so, we're evolving with that. But there's a few things that are pretty obvious. The developer experience and our API design was really the very first thing that we tested when we released the product. And it really fits with that AI application workflow. And then we also scale in a way that is suitable for AI application development rather than sort of the classical recommender system, web-scale semantic search use cases that have been out there for a while. And we focus on not just vector search, but we focus on retrieval as a component of the AI stack holistically. It's not just about like data scale. It's not just about feeds and reads. It's not just about latencies. But it's about retrieval accuracy. And it's about how you perform retrieval. And it's about like how your application performance is mediated by retrieval performance. We think about all of those things. We go a long way beyond vector search. And, of course, it's useful to talk about chroma as a vector database. But we really think of ourselves as a retrieval company. And even more broadly than that, we think of ourselves as building the memory and storage subsystem for this new computing primitive that we've been given in the LLMs and these AI models. I think all of our differentiation from how you install and run Chroma, to how we scale, to sort of the accuracy you can expect from it as downstream of our focus on the AI application use case. [00:05:32] SF: I guess by focusing on the AI application as sort of the core use case, you mentioned part of it was that forced you to kind of really focus on the developer experience. But are there other things that end up shaping the way that you have to think about product and also think about scaling when that is the primary focus maybe versus something else? [00:05:49] AT: Yeah. Absolutely. Just from a purely technical point of view, typically, in an AI application use case, you don't have a single massive search index. Right? And that search index typically isn't very static. What you have are many search indexes which are sort of analogous to tables in a traditional database. And these are all being updated online with modifications, adds, deletes. And you're also scaling in the number of collections as well, because you want to be able to spin up a new table or delete it as you get new users or as your application changes. And so, key to how we've architected our system and our forthcoming distributed system especially is like that form factor in those workloads. [00:06:27] SF: And you've been working in AI in some capacity for quite a long time. Even going back to your bachelor's thesis work and neural networks. [00:06:34] AT: Yeah. Such a long time ago. Part of me almost wishes that I had just stuck with it. Because right now, I'd be right - I don't know, I'd be a venerable AI researcher at this point if I had just continued around that track. [00:06:46] SF: Yeah. I had a similar thing with my master's degree. Actually, I think undergrad was my first introduction to neural networks. And no one was doing anything with neural networks back then. This was like 2002. It was not the sexy thing to get into. [00:07:00] AT: Yeah. I mean, my bachelor's thesis adviser was very confused about why I was doing this. Anyway, yeah, I've been at AI for a while. [00:07:06] SF: Yeah. I guess is things from your perception changed? And what are your thoughts on that change? [00:07:13] AT: Yeah. I think there is one fundamental change, which is super, super important. And if there's one message I want to get across in this discussion, it's this one. The one that's coming up. Which is AI used to be this thing that was a purview of like Labs or big companies where you needed millions of GPUs and tons and tons of data. And you needed to understand the papers where the models or architectures were published in. And you needed to have experience running a giant compute cluster to train your own model and then deploy it into the application that you were looking for. The reality is, today, AI is something that you can just access over an API like any other service, right? And that means that the space of people who can work on an AI application is much, much larger than it's ever been ever before. And that's moved AI from this heavyweight industrial research thing that you need to have a PhD just to come to grips with into something that can become a natural part of software engineering. And for me, that's the biggest change. And it's also the most exciting change. Again, that's really the number one message that I want to get across here. You as a software engineer can work with AI today. [00:08:20] SF: I agree with that. I mean, it's really sort of the democratization of AI into some sense. The same way that now you can call an API to make a telephone call and you don't necessarily need to know the ins and outs of how those phone systems work to do that. You're now able to use AI to do summaries, and sentiment analysis, and all kinds of amazing things that were not possible previously. [00:08:45] AT: It's almost contradictory. But it's almost a negative thing how much hype there is around AI right now. Because people start to get fixated on basically things that maybe don't matter as much as they seem. People, for example, get fixated on benchmark results of the language models. People get fixated on capabilities, like, "Oh, is it going to be able to do math in the next release?" I think the reality is, right now, the speed limit on how useful these technologies are actually our ability to experiment with them. I think actually that the models, as they exist today - you've discussed sentiment analysis and summarization, you have to think of them not as this task-specialized thing that can do five or six things. It is a general-purpose information processing system that can process unstructured information in a common-sense way. And, yes, it sometimes things wrong. But that's what we're actually dealing with. And so, the space of possibility of what we can deal with is as big as the space of possibilities offered by that general-purpose computing primitive. And we ought to be spending a lot more time and investment in exploring what the potential of that is. And I don't know. In preparation for this, we sort of discussed some of the ways that I think about the space. And I think like it's really analogous to the early web in that way in the sense that you have this general-purpose technology. And it can do, I don't know, a lot of stuff. We don't know everything it can do. And we start off with this tiny little sliver of what it can do, which, back in the day for the early web, was like, "Well, it's a new publishing medium. That's how everyone treated it. And I'm going to put my catalogs, and my menus, and newsletters and whatever online." Right? I'm going to to publish them because this is a publishing medium, right? [00:10:14] SF: Yeah. I mean, it's basically the digitization of what existed in paper becomes a webpage. [00:10:18] AT: Exactly. We still say web page, right? And it's not surprising given where the web came from. Hypertext and so on. And it took a while for us to really grasp the full power of the web as its own medium, right? And I actually think we still haven't fully explored it. It's just there's so many interesting new technologies to explore that we don't always invest in the same thing. AI is kind of very much in that early I'm going to take a menu and put it online stage of development. And one reason for that is because most people's first experience of AI is very very similar. People try one of the chat models and they talk to it and they're like, "Okay. Wow. This is pretty cool." And they immediately analogize in their head like, "Oh, that means AI is about chat models. I'm going to make a chatbot for whatever my thing is for my data." Right? And then this is actually Chroma's most popular use case today is backing these kind of chat with your data applications, which, don't get me wrong, they're very powerful. Being able to interact with a corpus of information nonlinearly, conversationally in a way where it can synthesize answers to your questions without you having to go through all of that information is very broad, very powerful, but to me feels like pretty primitive. It's the GeoCities website of AI. That's where we're at today. [00:11:26] SF: Yeah. I mean, I think that each sort of major tech cycle kind of goes through this phase. Like you mentioned, the early web. But we also had early social. Suddenly, everyone's like ramming a comment and a like button into every surface to become like a social platform. Or you had the somo, loco, or whatever it was, era where everyone was - it was like trying to do Yelp for different types of mediums. [00:11:49] AT: Uber for whatever, right? Uber for whatever became so popular that it's almost like a cliche for like a basic startup pitch these days. [00:11:56] SF: Yeah. And then what ends up happening is a handful of products that make sense for that sort of approach end up being successful. And then everything else kind of dies off. But you have to go through that experimentation phase in order to get there. [00:12:10] AT: One thing that I think is really important is to really just have fun with this stuff. The models are weird. And they're way more fun to play with than regular software because of how weird they are. If I write a bug, I've only got myself to be frustrated with. My mental model of how the system works is different to how the computer says the software actually runs. And I've got no one to blame but myself. Maybe the designer of the software for making it hard to use. But besides that, just me. In LLM land, we're very much exploring like what these things are and what they can do. And so, even when it like fails, as long as it fails in an interesting way, that's very fun. It's kind of delightful to see it do that. I think that even before we start talking about the economically useful ways that this technology can be employed, and I think that there are very many, I think it's important to just have fun with it. And the early web was kind of similar. You could just make whatever you wanted and put it in front of millions of people. You can kind of do that with AI now too. [00:13:01] SF: Yeah. And I think similar to the early web, AI is in this interesting place. And maybe this is something that happens with all technology. But if you showed something like GitHub Copilot or ChatGPT to someone, say, five years ago, they would have their minds blown. [00:13:15] AT: They would say it's impossible. [00:13:16] SF: Yeah. Exactly. But that being said, we are also really frustrated when it doesn't work as consumers. Even going back to, we mentioned, our early days studying neural networks, my early days on the internet, it was dial-up modem. And compared to the gigabyte internet connection I have now, it's pales in comparison. But I still probably get matter at my internet when it's marginally slow than anything else in my entire life. There's always this sort of dichotomy. Is it we're amazed but we're also perpetually dissatisfied? Is that just sort of the trend that you have to go through with every technology? [00:13:51] AT: I think that AI is particularly susceptible to this. And I think one reason for that is because we essentially communicate with it in natural language. And we feel that because we communicate with it in natural language in order to understand what we're saying. And we get frustrated with it when it doesn't. But that's like a magical amount of expectation to have from these things. Which is why I really try to boil this down in my own head to make sure that I don't get disoriented and say, "This is just processing information. It's an information processing element. I'm not really talking to it." But I think humans, because we're social animals, when we're communicating with someone or something, we're wired to project a theory of mind onto that thing. And because we talk to the models, we develop a theory of mind for the models. And then when the model's response doesn't meet that expectation of that theory of mind, we get frustrated. I think it's actually more common to get frustrated earlier with AI because it's so convincing most of the time. [00:14:44] SF: Right. Yeah. I mean, it's just like even pre sort of LLMs interacting with the customer service chatbot experience that doesn't tell you upfront that's the bot of some sort. And then you get into one of those really frustrating cycles where it's not able to answer your question and can't actually refer you to a real person. Nothing makes you want to throw a computer through a wall more than that singular experience. [00:15:06] AT: Yeah. And I think like this is the other thing. This is why I say the commercial applications here are very exciting. But I think like, really, a lot of the powerful stuff's going to come from play. And a lot of that play is going to come from individual developers who now have access to this thing and being like, "Hey, I'm going to try this weird thing and see if it works." And then if it does work, I can tell everybody else how to do it. I really have to stress enough that despite the volume of research, despite the hype, despite how much it's talked about on Twitter and like what common knowledge is right now in AI only lasts about two or three months at any given time, right? And so, what that really means is you don't need to have been around for two years, three years, five years working on this stuff. You can just pick it up and be like, "Hey, I just figured this out for the first time." [00:15:52] SF: And I think some of the reasons the web was so successful early on was we got a lot of people building on the web at that time. Kind of going back this experimentation cycle. And then there was also a tremendous amount of opportunity. People didn't know how to make money on the web. But they thought that they could make money on the web. I think with AI, one of the differences is I think there's less risk. If AI works, there's a clearer sort of path I think to financial gain. [00:16:21] AT: I agree with that. We as a company, and me personally, aren't as focused on kind of figuring out how to monetize this yet. Figuring out like who the value is going to accrue to. I think it's illustrative to look at the history of computing and like how money was made. And, typically, there's two ways to make money. You're either an application or you're a data platform. And we're the data platform side of that. But what we're really focused on today is onboarding people into this ecosystem because of how intimidating it is and because of the lack of really great material out there to take somebody who pretty much knows their way around software but is intimidated by AI. Because the future - arriving at those valuable use cases and turning this into an economically viable ecosystem requires that we find those economically valuable use cases and actually run and deploy them and figure out the best practices to make them actually work for people. The only way to do that is if more people are experimenting with this. [00:17:15] SF: Yeah. I think one theme or consistent theme we've been talking about is like we need more people essentially building and experimenting. As somebody who is a regular application engineer doesn't necessarily have your kind of background in AI. Where did they begin that journey of building AI applications without necessarily being an expert? [00:17:34] AT: That's such a great question. And I've observed so many people sort of trying to on board with this stuff. And I think we've developed intuitions around common pitfalls. Not just with our technology, not just with retrieval, but with how people work with AI in general. And I think I wouldn't say that there's a great single place to learn from today. We're working on something. It's called the AI Engineering Explainer, which we hope to launch in the next couple of weeks, which is designed to be sort of a step-by-step breakdown from the perspective of somebody who is an application developer who is familiar with software about how this stuff actually works under the hood in terminology and an analogy that's understandable by someone who's very used to software and get them onboard it, get them working with it. But the reality is, today, you might find various application specific tutorials. You might find videos like some of the ones I've done where I'm talking to builders about how they did their thing. But to start from scratch, there isn't anything really great. And I think that that's actually a big problem. I think AI in general has dropped the ball in terms of onboarding people in a couple of ways. And we're starting to see the correction of that. But not investing enough in developer education has been like a cardinal sin. It's been left to the people building to kind of spread their knowledge by diffusion rather than any sort of centralized solid piece of resources. Although, when I think about the early web, there wasn't anything like that either at the time. We at least today have the models themselves to help us write the code that we need to write. That's kind of nice. [00:19:03] SF: Yeah. And we also have I think more available tooling to widely disseminate knowledge and achieve - I mean, back when I was early days of learning how to build for the web, you got a book basically. I mean, the great thing I think about learning HTML and JavaScript back then was that you could view source on a web page and actually see what was going on, which is I think maybe one of the reasons that was such a gateway for a lot of people to learn how to build stuff. Yeah. I mean, in terms of education, what do you think is the right level of distraction? If I'm a application developer, maybe I'm a web application developer, I don't need to necessarily go to the GPU instruction level. But I probably want to go a couple levels deep in order to understand what's going on. What is that right level of distraction? [00:19:50] AT: I think, first of all, obviously, you need to understand the APIs. And the thing is, is some of the stuff in the APIs for an LLM is pretty familiar to any sort of application developer. For example, token quotas or things like that, they just resemble usage. It's not really that different. But some stuff you do have to dip a little bit down to understanding or at least having like correct expectations of what the model is really doing. Right? And so, I don't mean that you need to understand the transformer architecture in depth. You don't need to know what an attention head is. You never need to use the word weights for any reason. But you need to develop an intuition for the fact that these are text prediction machines. And you can control how they predict text. And you need to understand kind of the levers that you have to steer them in doing that. And that's kind of the right level of abstraction. You need to understand at least the model at least at like that text prediction level. And this is something, it's somewhere above the level of a syscol, I think, if I was to analogize it to traditional software. Probably, you should know what the OS is going to do if you ask for memory. That is kind of analogously the level you should be operating at, which I think is - again, I really want to stress, it's not that deep. Once you have some experience kind of just playing with these things either in chat mode or in sort of the API playgrounds that a lot of these places offer, you'll get it. You'll get it at that level. And after that, you start probably needing some guide posts around how to build this into regular software. I would say that's the other really important part is, "Okay, this isn't some standalone thing that's by itself. It should be integrated into other types of software." [00:21:22] SF: That level of abstraction will probably continue to go up as well. Just like we've seen in traditional software, there was a time when you probably needed to know a lot more sort of low-level details about memory management, maybe CPU instructions. But then if you're building frontend web applications, you can do amazing things. But you probably don't need that really level of detail in order to do that particular job. Because it's sort of more layers down than you necessarily need. But in AI right now, it's still so new that you probably need to get a little bit more into the details because all the tooling abstraction is not there yet. [00:21:56] AT: I think it's more than that. It's just this is a new thing. And it doesn't function really like anything else we've had in computing before. I mean, when you think about it, this idea of like being able to talk to your computer and it understands what you're saying in a common-sense way has been the dream of CS for a very long time. And now it's kind of here. It mostly works. It's kind of amazing. But it does mean that we have to think about it like as a new thing. You can only like abstract away so far before you abstract away the thing that makes it unique and interesting. We will definitely develop tooling. We will definitely develop new abstractions. We'll develop like more unified ways of working and thinking about these things. And some of that's already started to converge. You actually see a lot more of like the AI labs themselves pretty much running on the same API, which also makes playing with different ones super easy. It's really easy to switch models. You're not really locked in. [00:22:41] SF: Yeah. I mean, there's all kinds of tooling now, like Bedrock or even Cortex on Snowflake. You're changing a string essentially to point to a different model. [00:22:48] AT: Yeah. And I would even go like beyond that. You really don't need - one thing that I would really caution against is getting too into tooling and abstractions too early. Because there's so many, and there's so many possibilities, it's really easy to lose yourself in that without really coming to grips with what you're actually interacting with. And so, for people just onboarding, I would suggest to do it like as raw as possible. Literally, just hardcode strings in your language of choice and send them over and see what it does. Right? And then build your interaction layer on top of that and build like a tiny application and get an intuition. Because I don't think it's possible to get an intuition of what you need a tool for until you've experienced a problem that it's trying to solve. That's generally true in software. And in AI, it's so early that we don't necessarily even know what problems we have to solve. And so, a lot of tooling is just like, "Well, this could be a problem. Let's see if this is something we want or not as well." I would just caution away from that. [00:23:41] SF: Do you think, also, going through sort of the extra effort to build up that intuition is necessary for also navigating and understanding sort of the non-deterministic nature of these AI models versus conventional application development where you're dealing with very deterministic sort of algorithms? [00:23:58] AT: Yeah. Actually, I think it's important not necessarily to predict how a model will behave. Because like you said, it's non-deterministic. And they surprise you. But I think it is important to understand at least a little bit about like, "Okay, what should I expect? When can I expect it to like really be non-deterministic to like do something that I don't expect? How can I control that at the model level?" Understand that, for example, temperature zero is going to give you the same output for the same input every single time. That's determinism. But sometimes you want to raise the temperature higher because you want to kick it out of the loop that it's in or anything like that. Understanding when you can expect non-determinism is important. Developing an intuition for that is important. And then I think at least some practices for mitigating that I think is important. To understand like, "Okay, yeah. It could be non-deterministic. But there's actually plenty of ways to steer it in a way that will reduce that for you. Or where the non-determinism won't matter as much." I think, yeah, interacting with them in that way. And interacting with them in that very raw way to like see what's actually happening I think is very valuable right now. [00:24:57] SF: Yeah. Because to some extent, you have kind of tame these models to do what you want in the context of the application that you're developing. If you don't know sort of what you can get away with and maybe what sort of level of parameterization you should be doing or whatever it is, you need that essentially in order to kind of like get the best out of the models. [00:25:16] AT: Yeah. And, again, when people talk about how they're building with these things in practice, most of what they're talking about is like, "Okay, how do I control this to get it to do what I want?" Because there is some sense in which like you have the feeling that it can do what you want and you're iterating towards how to get it to do so reliably. That's the main part. [00:25:34] SF: What do you think is the main barrier to entry in your experience for someone who's coming at this new? [00:25:39] AT: Yeah. Like I said, I think the lack of unified educational resources that speak in a developer's language is really a hindrance. But to be more concrete about it, I think there are certain things that you kind of need to understand which you're very not obvious unless you're already kind of versed in AI, right? I'll give you two examples. One, it's really counterintuitive to think about this thing as not something you're talking to. It's necessarily reasoning in the way that you expect. It's just taking what the context that you've given it, which is just a string. And it's just predicting the next string that's all it's doing. It's only outputting strings conditional what you're telling it, right? And so, getting used to that idea is kind of hard. Because you're in between this kind of classical software idea where, "Oh. Well, the code is deterministic because I wrote it. And it's going to follow these procedures in this exact way," which the model is not doing. The model is outputting text conditional on natural language. But at the same time, you're also in this other side, "Well, I've interacted with this chat model. And now I've projected my theory of mind onto it. And why can't it reason in the way that I want it to?" Getting over that hurdle is hard. And then there's a few things pretty specific to how these APIs are implemented. One gotcha that I think is really common and hard to reason about unless about it is, in between calls to the API, the model is completely memoryless. It has no history. It has no memory of the inputs or conversation so far. You need to send the entire history of inputs at once to get the next prediction out of that. Including the model's outputs until that point. Because behind the scenes, in order to serve these models efficiently, you're getting a completely different instance of the model every time. And it's whatever GPU you get routed to that happens to have the right weights loaded that you'll get processed with. And so, there's a lot of stuff. But those two things as like an initial hurdle to sort of to just really get used to can be tricky and can be hard. And then all the consequences of that then become more difficult without those core concepts. And then I think it's like, "Okay. Well, I understand it's predicting text. How do I get it to predict text in a particular way?" And you go online and people will be like, "Oh, fine-tuning. And then in context learning." You're like, "I don't know what any of this means." But if you understand, "Okay. These things just predict text. And what I'm trying to do to steer them is to get them to predict certain kinds of text, then having them see the kind of text that I want them to predict becomes a much easier way to reason about it." But it's easy because there's such a long literature. And there's so much hype around AI right now to get lost in all these terms. And think that learning those terms is the important thing, when the important thing is just playing with the model. [00:28:08] SF: Yeah. It's I think hard to replace actual hands-on keyboards sort of just like playing, testing, iterating. [00:28:17] AT: It's not scary. Really, you can't break the computer. You can't break the model. You can just do stuff. And they're so cheap right now as well that your experimentation, even with the full-fat model that you have to pay for over an API is going to be extraordinarily cheap. You can get a lot done for like 10 bucks just to learn about them. [00:28:33] SF: Yeah. Absolutely. And I've heard you refer to AI as primitive. And I think we've kind of alluded to some of that thinking during the course of this conversation. But I also think a lot of the tooling is also pretty primitive right now. [00:28:47] AT: Oh, yeah. [00:28:48] SF: What do you think needs to exist to support AI application development? [00:28:53] AT: Such a hard question. Because, again, when you're early in the life cycle of any new technology, it's too easy to sort of get fixated on those first few use cases and then early optimize the tooling for just those use cases, which kind of locks you into those use cases even more instead of exploring horizontally. I think, honestly, again, knowledge sharing is really important. And we need good sites for sharing like what works and what doesn't at any given time. Because, again, the models are also changing over time. I think from the tooling side, one thing to remember is the models are going to get better. And they get better all the time. Tooling that is like designed to protect you from the models not being that good is probably not going to help you in the long run. For example, at the very start, there was like a bunch of tools that would like help you reduce costs. But the sort of inference cost of models are going to continue to fall, especially as scale goes up. And so, I think that like getting over-invested in that kind of tooling isn't great. I think, also, there's a bunch of tooling out there that hides what's really going on from you while you're trying to learn. And that stuff isn't great. In terms of what is good, I think once you have an application up and running, the first thing that people build is a way to evaluate the performance of the application on the tasks that you would like it to do and how it's performing with real users. And I think that there's probably, I don't know, two dozen frameworks that are all competing for that right now because it's so important to the space. And I think it will continue to be important. And the reason that it's important is the flexibility of the model is a double-edged sword. It can do a lot of things. But it can also do things you don't expect in ways that are hard to detect. Because you can't just like keyword search and see if it spat out an error. Because it's not going to do that. It might just say something bizarre in a way that you don't notice. Building the tooling that can help you understand if the model is performing the way that you want is I think important. And I think that the ecosystem will eventually land on something that works. And I'm actually kind a little bit surprised that it hasn't come from any of the large labs yet. That there hasn't been anything open-sourced. And maybe Meta or somebody will do this. But I think even just a unified standard for that is probably the right thing. There's a few products out there. But I think I'm yet to see really wide adoption. But part of that of course is there's just not that many instances of production AI applications yet. No tool has come to rule them. Because there's only a few. It's all noise right now. But that's the one I expect is going to be really, really important. [00:31:11] SF: It seems like the large labs are primarily focused on building and pushing models. The big foundation models. [00:31:19] AT: That's right. [00:31:20] SF: Do you think that we're going to see a consolidation in models eventually? [00:31:24] AT: I mean, the models are already fairly consolidated. The CapEx required to train a large language model from scratch is extremely high. And there's only a few organizations in the world that are capable of doing that. And they're essentially - it depends on compute availability. And then can you manage data at that scale? And you have to eventually start building data centers for these things. I would say, right now, it's fairly consolidated. But what I think is really interesting is we are starting to see frontier-level open-source models emerge. And so, even if you can't train them yourself, there's going to be a proliferation of open-source models that you can use and adapt. And some of the most interesting work in AI is happening with these open-source models. because unlike the models you can access over an API, you can really get into the guts of these things. Right? And you can really try stuff. And some of the most interesting experiments that have come out of it, like stuff like - and people can look this up. And this is me veering a little too far into AI jargon. But, basically, being able to do brain surgery on these things while they're running with a thing, for example, called task vectors where you basically stamp the model - you change the model's weights in such a way that it's more likely to produce particular outputs. And you can do that kind of on the fly. That's stuff that comes out of open-source and that you can actually use in open-source as well. While I think we'll see large labs continue to sort of run away with this, I do think we're going to see more capabilities emerge in open source. And I also think the utility of smaller models is being proven out pretty significantly. As the models get more efficient, as things get cheaper to use, you'll find that a small model is good enough. And the thing that you'll actually want is the control that you have over it. And things will kind of exist at both ends of that spectrum. And at the far end of the open-source spectrum, you're going to have all these weird hobby projects with locally running LLMs. And I think that's very, very exciting. I think those are really cool. [00:33:06] SF: Yeah. And the small models also are necessary for sometimes like edge device deployment too. Where, depending on the task, you might not be able to take the latency of a network call. [00:33:16] AT: That's true. But that's also a hardware question. Because inference at the edge is never going to be free. I know that Apple is probably investing quite a bit in this right now. And other companies who have edge devices have the opportunity to invest here. I think, again, it's easy to think about this stuff in the abstract. So you think about, "Okay, LLM. LLM at the edge." But the real driver of this is applications. It's like why do I want an LLM at the edge? What can I do at the edge that I can't do over an API? And then unlike in other domains, usually the reason to run something at the edge is lower latency or privacy. Those are the two things. Privacy preserving local LLMs make a lot of sense. But from the latency perspective, it might actually be faster to just make the API call just because they've got beefy inference hardware on their side. And the network latency is much, much lower than the inference latency. It's like what is the application? Why do we want this? I've seen a few very early attempts at like personal assistance that are aware of your environment and history that you're in. And they're in early days. But that could be an argument for why we want small models at the edge just to like perform that. Or another reason - and, actually, this is something that a lot of people already do for inference efficiency. You use a very small model to figure out which tokens are going to be hard to predict. And then you send the hard to predict tokens to the full-fat model. And that's something that models can do at the edge as well. But, again, the question is applications. What is it you want these for? [00:34:40] SF: Yeah. I mean, the one that I had in mind was around real-time translation. If I'm trying to speak and then I want it to come out in another language, then I might need to do that on my phone. [00:34:50] AT: Look, it's very possible. We all want the Babel Fish, right? And a real-time translation is a difficult problem, even with large language models for a variety of reasons, not least because languages have different word orders. In order to make it real time, you have to predict the next word that you're going to say better than you're going to say it which requires a beefier model. But there are applications like that, right? There are applications where the thing that I want is like really, really real time, and I can get away with a small model for doing that. But it's - I don't know. I understand this is a frustrating answer, but I really do think it's early days. I think we shouldn't get too far over our skis and deciding what this is going to be. [00:35:24] SF: You mentioned a few minutes ago about how we're not seeing a lot of real-world AI applications yet. How far away do you think we are from that, and what needs to happen in order to get there? [00:35:36] AT: Yes. I want to clarify that. I mean, there's definitely real-world deployments of AI applications. Today, I think most of them are fairly primitive, but we're making progress toward them. I have probably one of the more boring opinions on AI in San Francisco, despite running an AI company, which is I think that the real place these things are going to shine is in business process automation tasks, where previously you really needed a human to do them because you're dealing with unstructured information. You need to produce structured output, and the models are great at doing that. The thing about that, though, is every organization probably has dozens to hundreds of tasks like those, right? The classic one is reconciling invoices from a bunch of different vendors and everybody's invoices in a different format, right? Every organization has dozens to hundreds of tasks like that. But the thing is because those tasks are so onerous and because they take up real people's time, they get pushed towards the edges, and they get minimized. The ROI on using AI to automate any of those tasks individually might not be high enough to justify deployment. But collectively, all of those tasks together in an organization probably very much do justify an AI deployment. There's that interesting piece of dynamics there. Your question was like what needs to change. I mean, again, I'll push on this piece again. It's just like experimenting, like trying to actually solve these tasks with AI needs to happen. So we need to see if we're good at it or not. Then I think to really make this robust, we have to develop as a ecosystem confidence in the reliability and the repeatability of these systems. We're not there yet. The best places to deploy a lot of these models today is in places where it's okay if the output isn't 100% correct 100% of the time. Although in the future we obviously want them to do more than that. I think that a combination of experimentation, tooling, best practices, and then model improvements themselves will get us there. It's kind of interesting. What's important is to also let go quickly of common knowledge, so-called common knowledge, about what works and what doesn't because everything is early and changing relatively quickly. For example, right, even people building with AI today, most people aren't even aware of the tool use APIs on the models that you can call. So tool use is basically telling the model, "Hey, you can execute a little program. I'll execute it for you if you tell me to do that. I'll give you the result, and then you can continue processing, right?" That opens up the space of tasks that you can do but also makes the tasks that you're doing more robust because, for example, the models not good at arithmetic, but they can write a python script that is, which you then probably should sandbox, et cetera. But just like understanding that that will make the task you're trying to perform more reliable and onboarding people into using those parts of the models is going to help a lot in exploring these use cases. [00:38:15] SF: Yes. I mean, I think that there's going back to the sort of business process disruption. I think there's tremendous amount of value in doing that because that's generally not a lot of stuff that people enjoy doing. If you think about the legal profession, there's whole teams that just do research that they're just combing through papers and stuff like that. I think that's certainly an area that makes sense to be automated. But I do think that the number one challenge there is how do you guarantee the output or make sure that mistakes don't get made. I mean, you have the same thing with humans, but there's some level of, I guess - [00:38:50] AT: Well, with a human, there's someone to blame. [00:38:52] SF: Yes, exactly. Yes. [00:38:53] AT: With a human, there's accountability. There's someone to blame. With an AI, there isn't. The model, it's kind of nobody's fault if it goes wrong. It's a real problem, and I think there's good ways to deploy AI, and there's bad ways to deploy AI. One of the worst ways to deploy AI is to have it output many things and then have a human check the results because it requires the human to be constantly Vigilant, whereas the AI can output arbitrary output forever. When thinking about what applications should I use, watch out for that anti-pattern. Don't have the human always only be responsible for checking the AI's output. There's other ways to involve humans in the loop and make sure that it's robustly performing the test that you want. [00:39:32] SF: What's that look like you think? [00:39:35] AT: Usually, it's iterative, right? Rather than sort of big bang completing the task and then having a human check through all of it, it's sort of iteratively going back and forth with the human as the task is completed so that the human can at the right point - first of all, the human understands each part of the task. Second of all, the human at the right point can say, "No, we need to go back and revise this." Then you can actually store that kind of correction that the human did and have the model pay attention to that next time it's trying to execute the same task until you converge on being right most of the time or all the time. [00:40:01] SF: Right. If you were asking, I don't know, to write some legal brief or something like that and they need to do a fact check, it's probably a lot easier to loop a person in and be like, "Hey, is this accurate," or something like that. Versus spit out the entire brief and then the person has to read the whole thing because it's easy to make mistakes when it's sort of hidden within a large bit of text. [00:40:21] AT: It's asking people to do something we're not good at, which is be ever vigilant for everything. It's not what humans are good at. It's what machines are really good at, though. There's ways to get the models to check their own correctness. One thing that I would really like as a capability advance at the model level, which is an open research question, is robustly getting the models to understand what they don't know, right? Like robustly getting them to say, "There's information here that's missing for me, and I'm not going to try to make something up. I'm going to reach out for new information and in the future even know where to find that information." Today, the way that this is accomplished is really unsatisfying. Today, we basically actively train the models to say when they don't know something. For example, when GPT tells you that its knowledge cut-off date is a particular date, that was deliberately trained into it. That's the only way it knows about it. It doesn't know what it doesn't know unless you tell it what it doesn't know, and that's a problem. We need the model itself to be able to figure that out. That's something that I think would make so many things more robust if we had that capabilities advance. I think it's also necessary to general purpose reasoning as well. Knowing what you don't know is important to getting anything done really. [00:41:31] SF: I basically have some form of overconfident confidence right now where they just won't admit that they don't know something. It's like, I mean, if you had a team member and every time you ask, they never say I don't know. They always make up an answer, and they make it out very confidently. It's like, "Oh, that sounds kind of plausible. I guess we'll just take Joe's opinion on this." [00:41:50] AT: That's the thing. The obvious and people call them hallucinations. I don't really like that term. But if you start thinking of these things as information processing elements and if you think of them as just predicting text, it becomes obvious what this really is, right? Because if it's predicting text, it can only predict that it doesn't know something if it's already been conditioned to do so about that thing, right? They do generalize to some degree, and they are capable. They are capable to some limited extent. You can ask them about something that doesn't exist, and you might have luck in it responding that, "No, that didn't happen," or, "I know that." But it's like it's not clear when it will and when it won't. Knowing when the model doesn't know what it doesn't know is very important. There's external ways to do this. Again, this is - when you're tinkering with it, I think of this as like brain surgery. You can stick probes into the model's brain and kind of infer when it's making stuff up when it doesn't know something. But that's not very useful if you just want to access this over an API and use it as information processing element, which is why I'm hopeful that we'll see some capabilities advance in that direction. But even if we don't, again, there are so many tasks lying around that they could be doing today, even without that, that I think we should be picking them up and using them for. [00:43:01] SF: I used to always ask people things like where do you predict things going in 5 to 10 years. I think it's kind of not a reasonable thing to ask in this space. But, I mean, what are you, I guess, excited about let's say in 2024? We have a half a year left. Where do you think things are going to be in six months? [00:43:18] AT: Here's what I think. I think open source will continue to make gains relative to sort of the APIs in terms of capability. I think that costs will continue to drop. I think that context lengths will continue to increase. Personally, what I'm looking for is closer to things that Chroma is interested in which is this retrieval piece, right? To kind of give a perspective, the way we do retrieval today in AI is also very primitive, right? It's as if the code, which is the LLM and the database, which is the retrieval system, don't know about each other. Not really, right? [00:43:49] SF: Yes, absolutely. [00:43:50] AT: The model doesn't know where this information it's getting is coming from. In fact, it's just presented to it as if it's already there. The retrieval system doesn't know that it's talking to a model, right? Of course, in traditional software, that's also true that the sort of the database has its area of responsibility. The application layer has its own area of responsibility, but there is the ORM which kind of helps translate between the two. I think that and I'm always on the lookout for interesting research in this direction of which there is some of making these two pieces aware of each other and actually function together because I think that the model itself, even today the architectures that we have, should be able to ask for help and should know where to get that data. Over the next six months, I think we'll start to see early prototypes of that type of architecture emerge. There's a few places that are working on it. From the point of view of applications, which is what I think is really fun, I think that we're going to see at least one surprising or two surprising applications where people thought that doesn't work yet, and then it just works one day. I think that's going to happen a couple of times. For example, I know a couple of - I've spoken to them on our little web series where the common knowledge right now is like agents don't work, right? Of course, what the definition of an agent is very fuzzy. But the common knowledge is agents don't work. When in reality, there's plenty of places where the model is doing multi-step iterative reasoning to come to a conclusion, and it's totally working today. You can go and use it. I think we'll see two or three things like that emerge where people thought, "No, this doesn't work," and it's going to happen that actually, no, it works fine. We just needed to find the right way to do it. That's exciting only if people experiment. You need to go out there and hit an API. You need to go out there and try one of these models today. It's so cheap, there's no reason not to do it. Just throw some request at it and see what it do. One of the things that I like to do when I have a bit of downtime, if I'm running some computer or my code is compiling or whatever, is I go on one of the chat models. I just throw ridiculous things at it and see what it's going to do. One of my favorite things to do is try to gaslight it about fake historical events and try to convince it that this was deliberately left out of its training data for some reason and see what it's going to do. Again, I'm very interested in that knowing what it doesn't know thing. Sometimes, it'll tell you like, "No, I have no knowledge of this." You'll be trying to convince it that actually it totally does. That kind of experiment is really fun. But really, we're only going to get to any of those great useful exciting things if people go out and experiment today. People who have domain expertise are super underrated right now because people think you need AI expertise to be really good at AI. But I think, actually, you need domain expertise to figure out how to apply this new technology to your use case. Again, one of the things that I really want today is roughly probably once a day, if not twice a day, I run into some software process where I'm like, "Surely, the model can do this for me. Why am I doing this? Why am I scrolling through these logs on GitHub to figure out where my test exploded, instead of the model just pinging me when the test is failed and telling me exactly what it thinks is wrong?" Instantly, instantly, that could be built onto GitHub runners, and someone could do that. I'm at the point where I'm so frustrated with it. I'll probably just do it myself. But there's a million things like that, right? But that's domain expertise. That's because I've been building software long enough to know that that's something that I need and want. That's the other thing that I would impress on people. [00:46:58] SF: A lot of it kind of goes back to what we were talking about earlier of you have to experiment enough to build up intuition so that when you're working on something that feels totally underrated, you're actually able to connect the dots in a new way and say, "Oh. Actually, this would make sense to put this thing in here." Just like when it comes to conventional programming, when you first started out coding your first-ever programs, it's not like you're able to go from that to building something super sophisticated that no one's ever seen before. It took time of experimentation and learning in probably multiple years of investment to get there. [00:47:31] AT: Yes. I mean, look, that's part of how we're building this company is like Chroma does two things. Chroma makes retrieval, which is an important component of pretty much every powerful AI application really easy to use and deploy. When I say easy to use, I don't just mean on your laptop or in a Jupyter Notebook. I also mean like when you need to deploy and scale it, it needs to be as easy as possible because you don't want to be worried about this new piece of infrastructure while you're trying to figure out this other new piece of technology which is called AI, right? We want to make it as easy to use as possible across all scales and all use cases. At the same time, we want to make it as accurate as possible because we want there to be as many experiments as possible and for as many of those to succeed as possible. It's how we're oriented as an organization. Again, it's easy for me to think of things that I wish I had more time to try to play with and build because another one is like, okay, a user reports a bug. The default AI use case here is like, oh, the AI is going to either give support to the user, or it's going to try to fix the bug. But there's a third option. Just have it generate a script for me that can repro. If you can do that, that's going to save me so much time and generate a little test case where that bug is triggered, and that's something where you can really use this interaction loop because it lets the LLM call out to a code environment, and the code environment is deterministic. The code environment will tell the LLM if it's doing something wrong, right, until it creates a script that matches the error that the user said. Then, great, I have this reproducible test case. Then I can go off and do that without me first having to spend all that time. There's a million things like that. But, again, it's like you said. You need to develop the intuition. I need to both understand, hey, the model can do this, and I have this problem, domain expertise combined with intuition. I think that intuition about AI is much easier to gain than domain expertise. [00:49:07] SF: Well, Anton, awesome. This is really fascinating. Thanks so much for being here. I really enjoyed it. [00:49:11] AT: Of course. [00:49:12] SF: For everybody listening, you got your merching orders. Go up there and make an API call if you haven't done it yet. [00:49:16] AT: Just do it. It's so easy, and it's so much fun to get them. Even when they're doing weird stuff, they're doing weird stuff in fun ways. I think that that's important to play with. Again, we'll have our engineering explainer out the next few weeks. I'll share it with you, Sean. I hope people get a lot of value out of it because it's really oriented towards someone who's used to building software and would like to play with AI. I want to show you how actually easy this is. It's not as intimidating as it seems. [00:49:40] SF: Yes, fantastic. Thanks so much. Cheers. [00:49:42] AT: All right, Sean. Thanks very much. Bye-bye. [END]