EPISODE 1869

[INTRODUCTION]

[0:00:01] ANNOUNCER: A key challenge with designing AI agents is that large language models are stateless and have limited context windows. This requires careful engineering to maintain continuity and reliability across sequential LLM interactions. To perform well, agents need fast systems for storing and retrieving short-term conversations, summaries, and long-term facts. Redis is an open source in-memory data store widely used for high-performance caching, analytics, and message brokering. Recent advances have extended Redis's capabilities to vector search and semantic caching, which has made it an increasingly popular part of the agentic application stack.

Andrew Brookins is a Principal Applied AI Engineer at Redis. He joins the show with Sean Falconer to discuss the challenges of building AI agents, the role of memory in agents, hybrid search versus vector-only search, the concept of world models, and more.

This episode is hosted by Sean Falconer. Check the show notes for more information on Sean's work and where to find him.

[INTERVIEW]

[0:01:19] SF: Andrew, welcome to the show.

[0:01:20] AB: Thank you. Thanks for having me. I'm a big fan, so this is fun.

[0:01:24] SF: Nice. Yeah, well, glad you could be here. Glad we could work it out. Always good to have a fan on the show as well.

[0:01:31] AB: Absolutely.

[0:01:32] SF: I wanted to start with the big picture, or a big quick picture question. A lot of people are saying that 2025 is going to be this breakout year for AI agents. It's year of the agent. There's a lot of hype going on in the market right now. We're moving beyond just basic chat. From your perspective, what makes building these more autonomous agentic systems hard, and why does memory, or other components play such a central role here?

[0:02:01] AB: Yeah. Well, I think I've been thinking a lot about this, of course, and one of the reasons I think it's so difficult is that many of the tasks that we can put into a POC to show off what an agent can do backed by an LLM, they satisfy the weaknesses of the LLM, right? They draw on information and training, they use that generative ability, plus context engineering to produce information effectively. Then LLMs can do that really well and we've done a lot of work now to make agents be able to do that as well, right?

The tricky part is, I think, when the agent has to integrate in any environment and do something, actually change something and crucially be able to predict the outcome of the change. That's the part that LLMs just don't model, actually. They don't they don't model state transitions like that for environments. That's where they tend to break down, where agents tend to break down.

[0:03:02] SF: Is that primarily that companies get stuck in that demo POC mode, where they just can't move to a productionization standpoint, because of that limiting factor?

[0:03:14] AB: No. I think that's a whole other problem and is also a problem. I think it's more along the lines of, it's really easy to think about let's built an agent for this problem, but it's harder to think about the problem and map that back to what the agent will actually be good at and what you will need to make it good at certain tasks. Tackling certain problems with an agent with an LLM require more thought about the predictive component, beyond just chatbots and things like that. Then there's the whole side of get just getting out of the POC phase and getting things into production any meaningful capacity, and that itself is also a large problem for other reasons.

[0:03:54] SF: Yeah. In terms of some of them challenges around memory, or the value of memory when building some of these agentic systems, I recall to the models, stateless. Then, of course, they have a limited capacity in terms of how much information you can be sending to them in the context. What are the components of memory you need to be thinking about and how those influence context and how do you work through some of the optimization challenges of feeding it the right context at the right time?

[0:04:26] AB: Yeah, absolutely. Great question. I think, thinking about memory, even in your question, there's some implicit assumptions about what it is exactly, what type of thing it is. This happens all the time when I talk to folks about memory, and it's really interesting. The most basic level, if we just - not even talking about agents, if we're just talking like a web application, I'm a user, I go to a web application, I do something, I come back, I expect it to continue where we left off, right? It's just fundamentally, there's stateful interactions. Most things are just a stateful interaction where it continues.

Just the fact that the LLM being this component inside of more complex applications is itself stateless, right? We tend to think about that a lot, but really, it's just like, well, so are all of the web servers that we've been building applications on. What do we have to do to make that work? Well, we have to store data. We have to use a database. For me, that part is the one that's like, okay, this has to be a given. We have to start the process of building something with an LLM with the assumption that we'll have to store data. That is a form of memory. You can map this into the traditional, traditional - the 2025 version of what are the small pieces inside of that memory box, right?

One of them is typically a message history of some kind, because obviously, if what you're continuing is a conversation, you need the messages that were sent in the past, right? Messages are, they tend to be this lowest level, or fundamental part that's more about storage. Because we don't have to think a lot about storing that. We can just store messages. It's not a big deal. Until it is a big deal, because the person never started a new chat. They never actually started anything. They just started a new conversation within the same data conversation, right? I think many of us have seen this, right? Many of us know the whole point that we talk about now, context engineering versus prompt engineering, which was last month probably, is it's actually quite difficult at a certain point when you have enough messages to try to figure out what exactly is relevant to the incoming question, or the input.

Then it becomes a question of, okay, this has gone on for some time, we know the limit of this model, but we also know that research suggests, right? Even with long context models, it's still important for us to send as little as possible, right? Because they still get lost. It's a matter of compacting that, or summarizing the conversation. Now, we're talking about a different thing. This is just my opinion. Messages are just data. We're just storing what we have. Summarization starts to become an engineering problem beyond just storage, where we have to actually figure out how to get this conversation history for this application and this user summarize correctly, so that we can reduce the amount of context we're sending. Then it goes from there, right?

I'll wrap it up, right? But that's the next level. Beyond that, where if you imagine a person, or a cognitive system, let's say, that's interacting with people all day, what's happening is they go to sleep and their brain in a background thread, let's say, or a process picks things out that are important and tends to remember them. The other thing that can happen is that somebody asks about tacos 57 times, or 57 people asks about tacos. The person also remembers the taco thing, even though maybe that's not that important, but it's so frequent that they're right and usually stamps it in. 

You get this thing that's extracting long-term facts and putting them somewhere in the brain and then pulling them back out later. That usually is how we think about cognitive system of an agent, right? There's that third thing of taking pieces out of this conversation, putting them somewhere, we can use them later typically when the person comes back, the user, and wants to interact again. We know something. We don't have to look at all of the summarized message histories, we just know that they're vegan. Of course, we would give them a vegan recipe, right? Those are three big areas that I think about with memory.

[0:08:41] SF: Yeah. Then there's also the reference data that serves as this potential long-term memory, whether I'm looking that up in a vector database, or some other type of data store. I might want to pull that in that form a particular - be part of it, potentially a particular context, which then gets factored into this loop, compaction loop, summarization loop that also might be used in a later cycle.

[0:09:03] AB: Yeah, absolutely. It's really interesting to think about what's the difference between the knowledge base and long-term memory, for example. Because retrieval looks probably similar. We're going to use hybrid search, or vector, or keyword search to pull stuff out. It tends to be just a factor of time. Typically, long-term memory, and again, I'm not contradicting the given. I'm just like, this is just how I think about it. Long-term memory is stuff that we learned, the agent learned at runtime, let's say. Whereas, RAG, or the knowledge base tends to be stuff that we, the developers of the agent knew to include, or we dynamically injected, but still, we knew that we wanted to inject it. There's a slight difference, but the result tends to be very similar. We store it, retrieve it.

[0:09:49] SF: Yeah. Then even outside of the direct interactions of the model, thinking about the whole piece of software that's like an agent, because these are really full-body systems. It's not like you're just interacting with a model, the model is telling you to go and call a tool, where the tools may be going to communicate with an API. Then there's other memory systems that are part of this aside of the model of, how do you maintain state to make sure that if the API call fails, or it succeeds, you're not making that call multiple times, these types of key characteristics of event driven systems, or durable execution and so forth, too.

[0:10:24] AB: Oh, yeah, absolutely. Yeah, that's really fascinating, right? This is why I think people get tripped up thinking about this, because it's so much more complicated than it sounds. You tend to like, well, personally, I tend to even knowing a lot of this stuff, I'll tend to think about, or talk about an agent and be like, the model, the agent, the model, just interchangeably, right? Actually, it's neither of those things. It's a complex system that involves lots of different types of state.

When folks think about building out an agent that does complex things, multiple tool calls, a deep research agent, let's say, often, what they don't realize, perhaps until too late, maybe, or they realize and understand in the right amount of time, is that we're really talking about durable execution at some level. At some level, we're talking about dynamic workflow that typically in production is going to die in the middle of something and then have to restart. Or the user is going to be like, "Whoa. Actually, I was wrong. It's not a dog. It's a cat that I was looking at." Just back up and start at the other point where we were at.

Restarting from that other point, similar to the workflow crashed and we need to rerun it, but not re-execute every step. There's all those checkpointing, is what LangGraph would call that. It's been around for a while, right? It's been around all these different workflow systems. Yes. A big yes.

[0:11:47] SF: In terms of Redis's role in all of this, if we look at just starting even with short-term memory, which is where we first started talking about what are these different memory systems, LLMs have these limited context windows, you have agents that are maybe communicating over a same session or thread, if it's interacting with another agent, or interacting with a user that needs to be tracked, you have those message histories that you're passing back and forth. Where does Redis fit into that architecture?

[0:12:18] AB: I view it as very similar to a web application that is not agentic, or deterministic web application, let's say. You've got the, where does it fit from my perspective? Then, you've got also, where does it fit from the perspective of a random production engineer? They tend to overlap, but there are some ways I tend to think about it differently maybe. I'll give you my perspective first. First of all, we're talking about agents and, of course, so sticking closely to agents. There is this concept of often, we'll call it short-term memory, but lately, I think about it more as working memory. Just like I need working memory to solve problems as a human being, we need somewhere to put the stuff that we're juggling right now. That tends to be for these applications, a message history, or a message history and a summary of the past message history from six months ago that we summarized already.

Redis is absolutely a great fit for that. It's super-fast. But from just accessing it through the direct data structures that we have in Redis that have been around forever through the query engine, which is new for the open-source version, I think, as a core component of Redis. It's in Redis 8 community edition. Actually, yeah. It's in Redis 8. Before that, though, it's still available in modules and things. But really, what we're saying is data structures, or if you want to define a schema and make queries with a query language, you can also do that with Redis. In both of those ways, it's extremely fast. Faster than most things you're going to use for data.

For working memory, it's a great fit, especially because you can also think about working memory as being key value lookups. We know exactly what we need. We need that thing for that user, and we'll just pull it out, and that blob is what the agent's working with and working memory. However, as a vector database, it can also serve a purpose in retrieval, right? Retrieving stuff from the knowledge base that you can incorporate in the idea of memory, or from long-term memory. So, if you actually store these extracted facts over time in long-term memory, it's great at retrieval as well, also very quickly. That's not even it, right?

Actually, just finishing where I view Redis is streams. I think streams are a really overlooked thing, perhaps, about Redis. In the production agent that I most recently worked on at Redis, it's all about background tasks and streams. We track the state of this workflow in streams with a background task library called Docket.

Redis is right there, helping us manage, like orchestrating the state of this dynamic workflow that is an agent. It's also there in short-term memory and in long-term memory, and in retrieval for knowledge-based stuff. Now, of course, I work for Redis, so I'm incentivized to use it for everything as possible. I actually came back to work for Redis. I worked here once before. There was a day I was building out code retrieval for a RAG application, basically, and I was using PostgreSQL. I had been working with PostgreSQL quite a bit at that time, and at extreme scale levels of PostgreSQL, and it had been crashing in production a ton. I was really angry at PostgreSQL. I realized that you could do everything I was doing with Redis, and it would scale out at the way that I wanted, and it would be really fast. Anyway, we have all these things, and I view them as showing up in all these different ways.

[0:15:38] SF: I mean, Redis has been around for a number of years. Were specific things that had to be extended in the product to support some of these new workloads that we're seeing from agents, or other types of applications in the AI realm that weren't already available in prior versions of Redis?

[0:15:58] AB: Not in immediate recent history. Let's say, this year, I don't think we've been forced to introduce something new that unlocks one of these use cases with agents. I mean, in the fullness of time, the query engine itself was a module we added to Redis, not a core data structure. That's the big one for me that, when I think about what is Redis added to enable some of these use cases, and that's the number one thing. Being able to index vectors, query them dynamically with a query language. That's number one.

More recently, as an accompaniment to that, or a different approach, a new approach that we're experimenting with, Salvatore, the creator of Redis, worked on vector sets. That's a core data structure that sits alongside things like sets, and lets you do similar things to what you can do with the query engine right now.

[0:16:50] SF: In terms of some of the support around vector search, what is supported in terms of getting those vectors actually into Redis? Am I mostly doing the pipeline work that I would need to get those vectors in, aside of Redis, and then Redis ends up being the landing point for indexing and allowing me to retrieve? Or is there other things where Redis owns part of that pipeline?

[0:17:11] AB: The answer is yes. It is mostly a client-side concern in Redis's current architecture. If you want to put something into Redis as a vector, then you're going to be generating the embedding yourself, and it's landing in Redis, as you said. We also have a new product this year called LangCache, and a private preview, I believe. It's a slightly higher-level abstraction over using Redis for what we call semantic caching. We do have plans to experiment with, including the embedding model within that pipeline of just send things to that product, and it will do that part for you. Core Redis, and what we're talking about today, as far as what's available, you do it yourself. That's the first answer. I forget your other point there.

[0:17:55] SF: Is it primarily a landing zone where the pipeline is being built externally? Or is there, essentially, functionality in Redis that allows me to build some of that pipeline directly within Redis not having to leverage some other third-party tools?

[0:18:07] AB: Got you. Yeah. Yeah. Then I think that is the answer to your question, right? Most of the stuff you're building to produce the vectors, and then Redis is helping you to store them and retrieve them quickly.

[0:18:17] SF: You mentioned semantic caching. What is semantic caching?

[0:18:20] AB: Semantic caching is what we would call caching based not on a deterministic pattern, like a key, necessarily, which is how we all do caching for many years, right? Instead, on similarity. The similarity to an input being the thing that lets us pull something out of the cache. That's really helpful for questions that we have the answer for already, from the LLM. We've got the exact same question, very close to it. Many, many times we don't have to keep paying a vendor to produce the same answer, so we can use semantic caching for that.

[0:18:55] SF: Right. If I'm building out an agent to solve some particular problem, and I'm using Redis for basically, the service, the memory for my agent, am I building, because we're writing the code in the business logic to leverage Redis for some of the stuff? Let's say, I want to make sure that I'm not going and overspending on tokens, because the same question has come in multiple times, or the same task has come in multiple times. I can leverage the semantic caching. Am I essentially building that hook into Redis, or is there something where I can hook Redis into, I don't know, existing framework, and that's extracted away?

[0:19:30] AB: Do we pay you to ask this question? Because this feels like a great plan. You don't have to write that code yourself. You could, though. It's quite fun to write it now with Cloud Code. You can just let them go, or let it go and do a lot of stuff. However, if you're using many of the popular frameworks that exist for creating agents, like LangGraph, or you're just doing LangChain stuff and not even using LangGraph, then we have drop-in components. Some of those are available through external, or I should say, separate repositories, like LangGraph-Redis, get some of our open-source LangGraph parts. There is also a library that we maintain. A team at Redis that I'm on maintains called Redis Vector Library, or RedisVL. That's available on PyPI.

We've got a lot of components in there that are drop-in and a little bit more general purpose. You might not necessarily be using a particular framework, or you've got a Frankenstein of them, and you want message history, or semantic cache object, you want to drop into a Python project. You can do that with that library. We've got a few different options for you.

[0:20:36] SF: Do you think that will end up moving into a world with a plethora of agent frameworks that are out there, where memory will be more of this pluggable thing, where it's like, hey, I want to use Redis as my memory provider. I can just plug that in and these frameworks support it. Almost like, some kind of open standard for memory?

[0:20:55] AB: Open standard for memory. I think it's a really interesting topic, because I don't have a good answer on that. However, if you look at the way that frameworks like LangGraph work, I think you can see what I would expect to shape up, more or less, right? Similar to how LangChain has a vector store interface that standardizes just dropping in a vector store into a chain, you can also swap out the memory provider for your LangGraph agent. I think that makes a lot of sense.

Having worked with those abstractions, I can tell you it doesn't always work, because all databases have these slightly different personality quirks and traits. Especially if you think about things like, filtering the vector search. Of course, vector search, that's fine if we're just talking about just do a similarity search. When we're talking about do a similarity search and include these 25 different faceted search points around that, that's where it starts to be like, okay, well, this database doesn't even support that at all. Or this one has this particular query type for that, but it's different. It's not always quite as smooth as it could be. That's where a standard could be useful for sure.

I guess, what I'm really interested in though, just to be totally brutally honest is stuff like Cogni and Mem0 and agent memory products, agent memory frameworks. Thinking about that as like, what was that like? Because I've worked on systems like that now, and it's just a really interesting area where you take more of that thing that is related to memory out of the just drop in a component and use something place and put it in somewhere central. Anyway, it's a really interesting area of innovation, I think.

[0:22:45] SF: Yeah. I mean, in a lot of ways, you do need, essentially, a layer that's above the storage where that's agent specific. Like, you have a query engine, but you need some, I don't know, memory engine for agents that understand the summarization, compaction needs of an agent that go above and beyond those things that are typically supported by a conventional database.

[0:23:07] AB: Yeah, absolutely. I think where that falls currently and maybe what we'll probably see is agent frameworks are going to try to carve that out, I think. You can see LangGraph already trying to do that with LangMem, where this storage is a swappable component. Then the intelligence and the engineering around managing all that memory stuff and the cognitive system aspect of it is more part of the framework's job, or the job that it's taking on trying to do. It is a big job. Having now worked on that, it's quite a lot. I think you look at these products that I mentioned before and I mean, you can see why it's a whole system separate from the workings of the agent, which tends to be more durable execution, or workflow execution, which itself is more complicated than it sounds to.

[0:23:59] SF: Yeah. Yeah. Although, you can rely on at least, I don't know, a decade so of engineering that's been poured into some of those topics.

[0:24:07] AB: Yeah, absolutely.

[0:24:09] SF: In terms of long-term memory, is there different classes of how we think about long-term memory? It's not just, here's the data I want to keep track of, but do you think about those as being in different buckets that you need to be able to manage and treat differently as agents progress through their work?

[0:24:28] AB: Yeah, absolutely. I think if you think about, for whatever reason, when I'm testing out an agent and its memory, particularly, I use apples. I really like apples, I guess. I'm constantly talking to agents about apples. Everybody has their own weird thing that they do. But I think the apples thing is an interesting one. I can tell an agent that I like apples, right? That's just a fact. I did tell it at some point that I like apples, but you could look at it as a fact about me. In fact, if you were to look at it as whether or not it's stable over time about me in particular, you would find it's pretty stable. It's probably just a general fact. It might be a semantic fact that I like apples in the words of some of the papers that break this down into different types of memory.

Whereas, if you were talking to my wife about me in apples, she would tell you like, "Well, he likes apples, but which type of apple this week?" Because it changes every week, which is the apple that I prefer. That is more a fact that is very tightly bound to time, or duration. Yeah, last week, I did like honey crisp again. This week, it's more about cosmic crisp, and a little bit more tartness in the apple. That's more what you would consider an episodic memory. Or, again, if we were to break these different facts, you could store down into types, that could be one type, a type of memory that is time bound.

The time bound aspect is really important, because when the agent goes and retrieves, like I said, we're really talking about this process of context engineering, where by the end of the process of our engineering and the runtime portion of it, we arrive at the right context for the LLM to give the user of the application, or the agent a decent response, right? When we think about how to get to that outcome, how do we get to the outcome that the agent that I'm asking it to buy my groceries or whatever is going to go out and buy the right apple, I don't want it to buy honey crisp, it needs to know that I want cosmic crisp this week, without me having to go in and tell it every single time, because that erases the value of autonomy.

It needs to store the fact that I like cosmic crisp apples with the time, so that when it retrieves stuff to build that context, it can order them by time and overlay the most recent facts that it has that are episodic, perhaps. Now, most people I think would say, "Well, what? Well, wait. What is episodic exactly?" I think you could also look at episodic in a different way, which is to say, last summer, I visited Orcas Island. Last summer is also a time-bound fact. I visited Orca's Island. They are very similar. Because really, the only difference with the first one is we just didn't store the time. It's really general. It's not that useful. Anyway, those are something. I have actually a spicy take on procedural memory, which is another type of memory. We could talk about that separately, but that's my first response.

[0:27:23] SF: Yeah. Well, in terms of what goes into modeling this stuff though with an agent, if I need to keep track of when these types of things happen, do I have to basically go through the process of modeling this like a schema, where it's like, okay, well, I need to track the timestamps, and then I might need to factor in some sort of, I don't know, decay scenario on this knowledge over time, so that I'm not telling Andrew about something that happened six years ago that he no longer cares about and is no longer relevant. How does, essentially, do you inform the agent to be able to take into account all these different things?

[0:27:58] AB: Yeah, absolutely. I remember just a moment ago, you were like, well, but we have with workflows and durable execution stuff, we have all these decades of knowledge about how to do it. What I love about AI stuff is pretty much every problem in some way, almost every problem, boils down to something that is something we have decades of experience with. In particular, this one, the fact that context engineering and retrieval are all really a form of information retrieval means that we have tons of experience with this problem, exactly this problem. We've been putting things into search engines and trying to pull them out and pull out the most relevant things for a long time. This is one that's been around. If you've ever worked on any product with search, you'll know at some point, somebody is going to come to your desk in the past and be like, "Hey, it's the thing is that videos are on sale this week on the site. I need this boost. The engine should just push videos up."

It's a dynamic boost based on the type of content that's in the search index. Typically, we would also have a stable boost for recency, that starts to essentially downvote stuff that's older. The same thing is true for retrieving information. The answer is yes. You definitely do. You definitely do want to store more than just the text. You want to store things, structure data, like the time. I would say, probably, as much as possible, right? The time that the user referenced in the memory, if you can extract that into a time, then we can include that inquiries later. Then you could order information by the times that users were talking about in the memory, which of course is like, that tends to be more important to them. Then the time that you created it in the memory system, but which is also important, right? Yeah, that's my excited response there is yes, schemas, dates.

[0:29:52] SF: We have limited context. We're essentially going back and forth in some sort of session, even the short-term interaction that's important to that session could eventually grow outside of the context window and wants to trim it in some fashion, and so forth. Then at the same time, we also have to take into account what is stored in this long-term memory, where we're storing that and we want to take that into account as well. How do you think about being able to prioritize the sharing of that information? Because presumably, just like a person, recent conversation I had, I probably want to take into account at a higher priority than maybe something that is in my long-term memory store. That's still important. How do I figure out in the construction of the context, the context, engineering piece, how to actually force a prioritization of the short-term versus long-term?

[0:30:45] AB: That's a great question, because I would say, I feel like the answer is still out there. Because I'm just trying to screw with the prompt, until I get better retrieval many times.

[0:30:58] SF: Yeah. Still a little vibe prompting.

[0:31:01] AB: Yeah. I mean, honestly, every time I'm manipulating the prompt for "context engineering," I do feel as if at the end of the day, I'm writing English text to try to help someone understand, although that's not even what's happening, right? I mean, literally, the answer is I will tend to try to structure the prompt in different ways, so that I'm attempting to communicate in the prompt. This information is long-term, it's from long-term memory. We consider it durable, or we consider it very important. However, the user just said this stuff. This is the immediate conversation. You should consider this canonical if they override something else that they said. Whether or not that'll produce the right result though, it's not always - doing it exactly the way I just described is, you just hope and measure the results.

[0:31:54] SF: I think we're getting a little bit better at turning some of these things into more of an engineering discipline, but there's still a lot of this iterate test cycles of just, for lack of a better word, figuring out how to manipulate the model to get the help that you need it to produce.

In terms of using some of these various places where we're going to bring in additional information, whether that's a vector store, or some database, two years ago, or maybe it was a year and a half ago, everyone was always talking about RAG and vector databases and things like that. It feels like we've gotten to a place where we've realized that vectors and semantic search isn't the only thing that we need to do. We also need to take into account some other things. Can you talk a little bit about, why was there a gap with using, or relying solely on vector databases and how do some of these hybrid search techniques play a role within all of these?

[0:32:54] AB: Yeah, this is a great topic as well. I think, again, getting back to what is the engineering that we're doing in context engineering? Part of it is this English language, or not even - language manipulation within the prompt, that's not all, right? That's that would be prompt engineering. That's part of the challenge of writing a good prompt. The other engineering parts tend to be around information retrieval. That's how I think about this problem.

I think if you just look back at what exactly happened back then, this is just my like, I've been doing this too long, kind of take, I observe a lot of people trying to use a vector database for data that is fundamentally text-based, or where we're fundamentally going to get the input that we need to do, like an effective keyword search. However, it was very popular. A lot of this stuff, I think, feels much newer than it really is. We tend to think like, "Oh, right. Of course, we would need something new that not everybody uses, like a vector database to do this search for AI stuff, because AI is new."

At the end of the day, I think Cloud Code is a really good example of this question, or how this really plays out. If you look at Cloud Code and you look at Cursor, well, let's just say, some other, anything, really, if you just put anything in the other side and you say, well, okay, this team is building this thing and they chose vector search and they leaned hard into vector search. Vector search is great in cases where the input is going to be less specific, and you still want to try to find clusters of related things with vectors, because that's what it's good at. Or it's good at when the input is non-language, or we want to find like, we want to do something like image search. It tends to be not so much application, or, let's say, data specific, which of these things is going to be more effective, it often tends to be task specific.

A really good example is I open up my editor, or my code agent, whatever it is, which of these two I'm using, and I'm exploring a new code base. What I really need is something to help me identify the clusters, the clusters of related things. Because in projects, they're often stored in all these files in other places, even though they're actually conceptually related. You could take a slice of that through the entire application in a bunch of different files and different directories. Those are the things that are related, not the things in the directory. Or they're related in a different dimension, and that's the one I care about. Semantic search, or vector search can be very useful in that case. I'm exploring a new repo or whatever.

Then there's the time that I tell the agent to rename a variable. I definitely don't need. I don't need fuzzy search for that. I just need it to use keyword search and do exact matches on that variable name. Like I said, it's the same data. It's a repository. I'm the same user. I'm using the same application, probably. But the task is different, so the demands on search are different. I think in some cases like that, you can really break it down easily into this is the search type that's appropriate for this task. There are other cases where we're doing something like, our knowledge base has lots of different structured and tiered, or hierarchical data. Maybe that sounds fancy, but I'm just talking about a book. A book is a book and it has - maybe I indexed several books, and they all have chapters. Those chapters have paragraphs.

There's a paper called The Rafter Paper, where they went through recursively summarized each of these things, when they broke them up to go into the index. When you look at that, then we're talking about, well, what do I really need exactly? Because we're going to have in their case, by the end of the paper, sorry, spoiler alert, but what they found is if you do it as a hierarchical search, after doing all that work, you've got everything in the database, in their case, what they have in the database is embedded summaries of each of those layers. They have embedded summaries of the collection of things, the embedded summary of the part of the collection of things and embedded summaries of the leaves with the tiny paragraphs, or whatever. Or sorry, and in their case, specifically, they have the actual paragraph at the leaf node embedded directly, and then the other things are summaries.

When as a user, I go and you need to use this application that uses something like that, which I have worked on, the search that we want to do from the agent side isn't so cleanly one or the other, because one of those things is going to benefit from exact matches. That's going to be the text that's directly in there. It's usually directly in the database in a real application, not just as vectors, but also as text, so that we can do both types of searches and some kind of hybrid search. It tends to be effective when you've laid the data out like that with summaries of some things, the actual content of the others to do both searches and fuse the ranks.

[0:37:46] SF: In terms of making this choice of how to do the lookup based on a task, does that have to be an engineering choice that's essentially predetermined? Or is it something that you can rely on the model to be able to intelligently decide that, okay, in this circumstance, I realize I can do a key-based lookup. I'm going to talk to some tool endpoint that can do the keybased look-up versus something that's more semantic in nature.

[0:38:13] AB: It depends, I would say. We talked to people about this a lot on my team. This is why measurements are so important. Being the first thing that you should work on in this project is trying to measure the quality, or accuracy, the things that matter to you about this thing that you're building, so that you can experiment, because it will depend. My opinion is it also depends on the database that you're using. This is actually another reason why I was like, you know what? Redis is in this position where I could - this is a good thing. I'm going to go back and work on AI stuff. Because if I have to make two slow searches to do a fusion, or these days, a lot of databases will just do the fusion on their side, still their data is on disk, right? The search is still going to be slow, even if they manage some part of that on the server side, on the database side.

If that's true, it could be a problem either way, right? Then we could add latency by doing too many searches to try to improve accuracy. Whether that's, we do too many searches and we do them every single time, because we do it deterministically, you don't let the model choose, that's a real problem, because it's every single time. Every interaction between the user and the agent, that spawns off a series of tool calls from the model, or whatever happens. As a tool call, that often works really well, I think. It depends on the model, but I've had a lot of success with moving things into tool calls, so that you can break it up and it will often make good decisions. But you don't know unless you've measured how that performs over time.

[0:39:47] SF: Right. Yeah. I mean, all of these things come down to, hopefully, you're not just putting your finger in the air, in the wind and measuring it that way. Yes, it's more formal way of testing and actually evaluating it. In terms of what you were talking about before around hybrid search, for any of these memory systems that we're talking about, do you really need to be building these, thinking about what you're building from scratch and modeling it from the beginning? Or can you leverage, if you had this data already, some of this data already somewhere in a database, can you leverage those systems, or do you really need to be thinking about how to get that data in a form that can be easily consumed and serve the AI use case?

[0:40:29] AB: I think you do. I think you do need to be thinking about it. Because like with other data engineering specific problems, data is almost never in the right form for anything. Application developers like me will just cowboy through and put a bunch of junk in the database, and you just expect that that's good. Operationally, it's fine. Then, you try to use it for anything and it's like, why? Why did you put all this stuff with commas in one field? It was just ridiculous. There's no way to split these up without knowing. It tends to be the wrong format, and that's particularly true with AI stuff, where like I was describing earlier, we could imagine that I have this data set of books that I have licensed and purchased. They are not pirated books. I'm putting them in a database, right? I could just put them in like that.

I could literally just put them all in one record, one document in my document database, or Redis and it's all just text. That's not going to work for many, many different things, but especially not for AI stuff. You really need to look at what you're doing with AI. We would call it chunking, right? Like with the Raptor approach, where they break these things up into pieces and then recursively summarize the different levels of them, that's really the thing you need to be thinking about. Even if the data already exists, taking the data that already exists and chunking it out like that, depending on the type of search that your agent is going to be making.

[0:42:02] SF: Especially with vectors, thinking about what your, perhaps, your re-indexing strategy is as well, where if you need to reprocess, let's say, a website, or web page, the web page updates, then you need to reprocess it. What are the strategies for actually updating the vector store that has the chunks of that page? Do I need to blow away the original copy of it? Inject a new data and re-index? Or is there a better way of handling that update, or upcert?

[0:42:34] AB: The answer is it really depends on the database. Even within the database, how are you storing the data? I'm thinking specifically of Redis, right? With Redis, you can use a hash and store multiple fields, in which case, the vector is just one of the fields. Then, if you have the ability to give in a particular input, know that the representation of that in Redis, let's say, as a hash, is different from the source now. Sorry to reuse hash, but if you can hash a set of inputs that's stable, and will tell you that that's the same web page, but the content has changed, then yes, I mean, you can just re-index, you can just change part of the hash. If you're using the query engine in Redis, it will re-index that hash with the changed value, for example.

JSON is the same thing. Redis now has a JSON data type, and it works pretty much the same way. You can index specific fields into JSON. There's that. Obviously, like I just said, there's a couple things you have to think about. How do you map them? Usually, it's hashing stuff to get the same record, and then whether the database can support that. Honestly, I would struggle to think of a database that couldn't support that strategy.

[0:43:49] SF: What do you think is the next frontier around these memory systems for agents? What is missing, besides some of the things that we talked about of just like, hey, we got to vibe our way through the context engineering? But is there core pieces of the data infrastructure that's missing, that would help us solve some of these problems with getting some of these agents to perform really well in production systems?

[0:44:15] AB: Yeah, absolutely. There's the big missing piece, I think. There's a paper this year that Google did, Google DeepMind research, and they did a paper. A lot of people probably have read. I don't know. I don't know how that works. Only some people stay up at night reading academic papers, I guess. If you do, general agents need world models. This paper is haunting me. I don't know why. Probably everybody at work is like, "Would you shut up about the paper? It's really not that long. We don't need to hear about it every day." It haunts me, because just for fun earlier this year, I was like, I'm going to make an agent that just plays text games, because those are fun. Oh, making a text game for an agent and then being able to show people, oh, this is a text agent, a text game playing agent and it uses memory to learn. That sounds like a lot of fun. It was fun, until the agents didn't actually improve all that much.

This is what got me thinking about this, and then it aligned perfectly with what I see in this paper. The problem is when I actually started this conversation with, because it's always on my mind. I'm thinking about building an agent over the next few months that will interact with live infrastructure and make changes. That's very similar to an agent playing a text game. The thing that's similar is to survive that environment, to do things successfully, the agent needs to predict how an environment will change. That's fundamentally not about predicting the language that will represent that afterward in ways that it's going to actually succeed at predicting the next state change if it does something.

Agents that are playing these text games, you could do a lot of context engineering. You can improve their performance. But by golly, with the amount of work that you could pour into these things, they should be able to pick up a game and generalize on what they've learned from a past game. Often, that's not true. That's the same problem that any agent is going to run into.

[0:46:04] SF: It's almost like a reinforcement learning, but for the agent, yeah.

[0:46:09] AB: Right. Well, that's the problem is. So, it's true that, and in fact, text world, this framework I used for this agent is for reinforcement learning. Because the thing is, if you know exactly what the agent's going to do and the environment it's going to work with, or in other words, in a game playing agent, the specific game it plays, then you can generate interaction data. You can generate the state transition data to be able to do reinforcement learning. It will become better at that game, but not necessarily at other games. That's the general agents need world models, because it's just not realistic to think that every time a general agent encounters a new problem, it's like, well, what are we going to do? Are we going to just shut everything down and spin up a new reinforcement learning cycle and teach you how to do that one thing? Because things change all the time. Even within one environment, they change a lot. I think we still have to figure that out.

[0:47:04] SF: Yeah. Then, I think, also, going back to your management of infrastructure example, besides the learning that would have to happen, there's also probably a number of other memory challenges are going to go on with that, where it's going to be this long-running continuous thing, where it probably has to long wait cycles for certain things to happen. There's a lot, both from a distributed system standpoint and also from how you're managing the state with the agent over these long-running processes that you would have to figure out. There's a lot of complexity with making that agent, actually.

[0:47:37] AB: There is.

[0:47:38] SF: Something that you feel confident unleashing within your infrastructure.

[0:47:41] AB: Yeah. Somebody was talking about this agent, whether this sounds like a good project. I was like, "That sounds a project that absolutely is going to fail." I am all in, because that's what's exciting. I don't see the path yet to that working really well. I'm very excited to find it. Yeah, I agree with you.

[0:48:01] SF: Yeah. Well, awesome, Andrew. As we start to wrap up here, is there anything else you would like to share?

[0:48:06] AB: No, I think we covered a lot of things that are just on my mind, a lot about agents and memory. We had a really good conversation. It was awesome to chat with you and dig into the specifics and talk about my fears and my dreams.

[0:48:17] SF: Fantastic. Well, thank you for sharing your expertise. I enjoyed the conversation as well. Cheers.

[0:48:21] AB: Excellent. See you.

[END]