EPISODE 1831

[INTRODUCTION]

[0:00:00] ANNOUNCER: Contextual memory in AI is a major challenge because current models struggle to retain and recall relevant information over time. While humans can build long-term semantic relationships, AI systems often rely on fixed-context windows leading to loss of important past interactions. 

Zep is a startup that's developing a memory layer for AI agents using temporal knowledge graphs, enabling agents to retain long-term contextual information. It was founded in 2023 and was part of the Y Combinator Batch of Winter 2024. Daniel Chalef is the founder of Zep. He joins the show with Kevin Ball to talk about the challenge of contextual memory in AI, temporal knowledge graphs, ambient AI agents, and more. 

Kevin Ball, or Kball, is the Vice President of Engineering at Mento and an independent coach for engineers and engineering leaders. He co-founded and served as CTO for two companies, founded the San Diego JavaScript Meetup, and organizes the AI in Action discussion group through Latent Space. Check out the show notes to follow Kball on Twitter or LinkedIn, or visit his website, kball.llc. 

[INTERVIEW]

[0:01:20] KB: Daniel, welcome to the show. 

[0:01:22] DC: Thanks for having me, Kevin. 

[0:01:24] KB: Yeah, excited to get into this. Let's maybe start. Do you want to introduce yourself and Zep and what you all are all about? 

[0:01:32] DC: Yeah. I'm a software engineer, turn founder. And this is my second startup and my first AI startup. Like many, we're just under two years old. And our focus is on enabling the agentic future. And the way we're going to do that, well, the only way we can do that is to ensure that agents have the right information available to them at the right time. And so Zep is a memory layer for agentic applications. And we focus on midsize companies in the enterprise. And it's a very exciting space to be in. 

[0:02:05] KB: Let's maybe define some terms here, because while some of our audience is familiar with all these different things, maybe not everyone is. When you say the agentic future, what do you mean by that? I feel like agent is a term that different folks mean different things by that. How do you define agent? 

[0:02:22] DC: Yeah, it is so nebulous, right? There is a definition for agents. It's just become pretty loose. And the way I think about it is that AI agents today have an LLM as their brain, but they also have the ability to autonomously interpret instructions and make decisions, which is a very important aspect of agents, and then take actions to achieve goals that have been set for them. Those are the three high-level attributes that I look at when I think about what agents are. 

When you dig a little bit deeper into it, though, there are several high-level components or several needed components for an agent to be able to actually do what I just described. And they have tools, for example, for taking action. And tools might be something like querying the web, searching the web for answers. A tool might be actually taking action and align a business application, generating an invoice. 

And the other important aspect is, for an agent to be able to reason and understand what to do next and make a decision, they need to have broad understanding of the environment that they exist in. What is the user's will if there's a human in the loop or what is the business world if they're purely reacting to changes in business state or our personal state like our home? And they're going to have to have a very broad understanding of that and so memory is important. Being able to recall what they did in the past and being able to then plot a way forward based on this maybe new stimulus that they've received. 

And memory really enables planning, and that's the other big important aspect. And there's something called a perception action loop when we talk about agents. And so agents look at stimulus, maybe a human's request or some event that is coming from the environment. They look at their memory, they perceive this, and then they decide to take action based on that using the tools that they have. That's just a very high-level view of the way I kind of look at what agents are. 

[0:04:49] KB: That's really helpful. Let's maybe dig a little bit deeper on what those pieces are. Tools, I think, is not going to be the big focus of what we're talking about here. But maybe first defining a little bit, that's essentially function calls, right? Making action using a computer somewhere. 

[0:05:04] DC: Exactly. Yeah. 

[0:05:06] KB: Okay. And then reasoning, LLMs, we're all kind of familiar with what these things enable. Memory, that's where you are focused. And I feel like it might be worth us kind of defining that a little bit more carefully. Is this just context that you're dumping into your LLM prompt in some way? What do you mean by memory? 

[0:05:27] DC: Yeah. I like to think of memory as quite expansive. And there are multiple types of memory that we look at when we think about agents. There's a short-term memory. A short-term memory is merely what is happening in the current - if you're talking about a human in the loop agent, what is happening in the current conversation? What is the user just asked? What did they say previously? What do I know? You can think of it as a human's short-term memory. But we also need long-term memory. 

And if you think about how the human brain works, long-term memories are often processed in some way and they're put into a big data bank or memory bank for recall. And there are various different types of long-term memory as well. We could remember how to do things. There's procedural memory. But then there's also semantic memory, where we build relationships between different events in our lives or things that we've perceived and connect them all up, and those go into the data bank as well. And so those are various types of long-term memory. There are more, but those are the big high-level aspects of long-term memory. 

If we double-click on semantic memory, this is where things get really challenging, because humans have an uncanny ability to draw connections between various parts of our lives and our experiences. Sometimes we get it wrong. I forget people's names all the time. I forget what I ate for breakfast. But most of the time, we're able to file it away and recall it over very long periods, which is a marvel. But agents are going to have to have something that approximates that to be able to work effectively. Not only that, but the promise of AI is the ability to process vast amounts of data and make sense of it far more than humans are able to conceive or understand in a very short period of time. 

[0:07:41] KB: Okay, that makes sense. And I think I understand why you double-clicked on that particular area, because that ties into this concept of a knowledge graph, which is, as I understand it, what you all are building. Let's maybe talk about knowledge graphs. What are they? This the implementation of those semantic links? 

[0:08:00] DC: Yeah. Knowledge graphs are data structures that allow you to semantically model complex relationships. And they typically contain something called a triple, where you have two entities or what are called nodes in the graph. And those are things. You could say a person is an entity or even a concept could be an entity. You have two of those. And then between them, you have a relationship. And this is called an edge in the graph. And the relationship describes what an entity to entity relationship is about. That's what the edge does. That's a knowledge graph in a nutshell. And why they're very useful from a memory perspective is that we can build very dense and well-described data sets, semantic data sets that are a very good fit for retrieval. 

Graphs allow all sorts of interesting approaches to retrieving data. There's actually a very mathematical approach that one can take to retrieving data from a graph and traversing the relationships that are available. And there are all sorts of wonderful algorithms that have been developed that allow you to interrogate the data and also do so in a very intuitive way. 

I absolutely love knowledge graphs. I think they're an amazing data structure for condensing information. The challenge with building knowledge graphs has always been defining the ontology or the types of things that you care about and how those relationships between the things can take effect. What sort of relationships are allowed in a knowledge graph? 

Defining ontologies has always been a very challenging endeavor, and then also building the knowledge graph has previously had to be done manually or in some sort of not very flexible way. But with LLMs, we have an amazing new opportunity to interpret data at scale and extract entities from unstructured text or even structured data and understand their relationships in ways we've not been able to do before and, again, at scale because previously it was hand-building knowledge graphs. 

[0:10:40] KB: Let's maybe dive into that. What does it look like to - let's talk about ontology, right? Defining your categories of things. When you say with an LLM, you can just sort of automate that away. Is the LLM defining your categories? How are you thinking about that piece of this? 

[0:10:55] DC: Yeah. Let me speak to Zep's open source Graphiti library. What's unique about Zep's Graphiti library is that it has a temporal dimension. But let's not dive into that yet. What I will speak to, however, is how Graphiti is able to build an ontology on the fly for data, which is very, very useful when you're working in a domain that isn't necessarily as contained or bound as we've seen in the past. If you are building agents that interact with humans, humans say the darndest things. 

[0:11:42] KB: Yes. So do LLMs for that matter. You get all sorts of bizarre things coming out of them. 

[0:11:48] DC: Well, yes, yeah. And so if you are, for example, trying to understand human preferences as an agent that maybe controls a home, building a well-defined ontology can be extremely challenging. And that's one of the things that has encumbered developers in the past. It's very challenging to fit your data into an ontology. And so Graphiti builds us on the fly by doing named entity recognition across the data that it is seeing. So it is determining what the things are, and then intelligently understanding those relationships. 

Now, that could become very, very problematic at scale if you're not being clever about deduplicating the things. These two things are alike. They might be differently named, but conceptually, they're the same things. We should use the same type. Secondly, these relationships are very similar. And so we should use the same labels for the relationships rather than generating an entirely new label. Because, otherwise, your graph becomes very difficult to query into if there are many different types of entities that are very much related. 

[0:13:11] KB: Let's maybe talk about type because I think one of the things I saw on Zep's website is you talked about pulling out strongly typed data from broad chat information. How do you think about that type extraction? And are those types themselves then morphable? Because I might have a conversation where I name three things about this entity, and then sometime later the agent is having another conversation and discovers, "Oh, there's a whole slew of additional data that's attached to this type of entity." 

[0:13:41] DC: Yeah. I spoke a little bit about Graphiti's organic development of an ontology. Graphiti also allows developers to define types or custom types that are well-described, I.E., developer can specify what the type is about. This is a person and a person does XYZ. Or this is a customer and a customer is defined as follows. And a customer has a company name, a first name, a last name, and several other fields. And that takes things to the next level in terms of being able to have a well-understood ontology. And you can do so very simply using pedantic - Graphiti is written in Python. And you do sort of very simply using pedantic models, which is something that developers have become very accustomed to when working with LLM APIs, such as OpenAI's structured output. 

And what that allows developers to do is build an ontology that is better structured for their particular use case, but Graphiti will still extract entities that don't match existing types. And so that allows you to get the benefits of this organic development and well-maintained ontology, but also have structured types that make sense for your particular business. 

[0:15:19] KB: Now, if you have, for example, a user type that you've defined and Graphiti is interacting with an end user of some sort and it discovers, "Oh, there's a whole bunch of other contextual information that keeps showing up around users," that it thinks is probably useful. Does it create a new type overlaying users? Does it extend your defined user? How do you think about this sort of - the fluidity of what you're interacting with people when you're dealing with the LLM extracted things, it can come up with all sorts of stuff? 

[0:15:50] DC: Yep. It would likely create associated entities with the user entity that describe an additional relationship. For example, in the company type that I described earlier, if we hadn't put first name and last name of a particular contact at a company, it might create a contact entity type and that contact entity type has a name. 

[0:16:17] KB: That makes a ton of sense. You have a set of types that you've defined as a developer that you are pretty guaranteed these are going to have these fields in this way. And then there's a set of types that Graphiti is interpreting, creating on the fly relating to other types. If, for example, it's a Graphiti-owned type and it discovers here's some new fields, will it modify the underlying type or it will do the same trick of like, "Okay, this is a contact and now we have a contact phone tree because we know there's a set of phones," or I don't know what the additional entities might be. 

[0:16:50] DC: Yeah, probably, a thing would be a phone number. Phone number would be a thing. It would create an entity for the phone number related to the user or the company. It's pretty clever that way. You mentioned changing data though, which is where Graphiti really shines. And I hinted at that a little bit earlier with Graphiti is a temporal knowledge graph. And it's a little bit of a mouthful where Graphiti is specifically designed to deal with temporality and dynamic data. And this is very unlike other RAG frameworks. 

And in fact, I like to think of a major shift coming in terms of how we view building LLM-based applications. Well, it's actually here already. And that is we've shifted from these Q&A chatbots, so question and answers over a document corpus powered by semantic databases providing RAG and RAG frameworks. RAG frameworks work really well with static data. But the way I described agents and the way I strongly believe agents exist in this new world is in a sea of dynamic data. And I think we're in a post-RAG era now. Look, RAG didn't last very long, three years. But we're in a post-RAG era now. And Graphiti is a post-RAG framework. It deals with dynamic data and it deals with or supports a constant stream of unstructured or structured text, structured objects like JSON. And it's able to integrate this data into the graph in a way where it is understanding whether the new data or knowledge that it has created from the data is in conflict with existing knowledge in the knowledge graph. 

For example, just a very stylized example, I purchased a pair of Adidas shoes six months ago from an e-commerce agent and the shoes fell apart. Sorry, Adidas. I shouldn't have actually used the brand. And I send them back to the return system using the return system. That's not part of this particular agent that I spoke to. It's a different agent that manages returns. And I sent a nastygram back. I'm very upset that the shoes fell apart. 

And so my previous brand preference, because I'd had a conversation with the e-commerce agent and said, "Hey, I love Adidas shoes." My previous brand preference was noted as being Adidas. But now I've sent these shoes back with a nastygram saying, "I'll never buy your shoes again. I'll never buy Adidas again." My brand preference has changed. And so when we integrate that knowledge, the stream of JSON from the returns agent into our memory store, we've got to update that brand preference. And that's what Graphiti does. It understands that the brand preference has changed and that it needs to invalidate that previous relationship where Daniel loves Adidas shoes to Daniel loves Adidas shoes. Fact is no longer valid. 

[0:20:26] KB: This is interesting, and there's a bunch of different pieces we can explore on this. I guess, first off, just at a very vanilla layer, under the covers, does this essentially look like timestamps on any piece of data of when it was invalidated or when it became known and when it became invalidated, or how is this implemented? 

[0:20:47] DC: Graphiti, and I'm going to get a little bit technical here, has a bi-temporal model. And it has both episodic memory and semantic memory. And actually, in Zep's implementation, also of Graphiti, we've also implemented procedural memory. Episodic memory is an event or a chat message or similar and you send it to Graphiti and it has a created date, the event that you created that episode. And you can think about it as the episode might be a conversation that we're having today. It might be a single utterance in the transcription of this particular conversation. And it has a timestamp and that becomes the created timestamp of any sort of relationships that we add to the knowledge graph. 

But I also might have mentioned that I purchased a pair of Adidas shoes six months ago. And so there's two entities there. Daniel, Adidas shoes purchased, and that relationship has a created date from today, but it has a valid date from six months ago. And then when we now send the shoes back, and we say we're really angry, we create a new episode. That's a JSON data event coming from the returns agent. And we update the knowledge graph by integrating this new information in Graphiti parlance, there's another date that we add to the edge or relationship, the fact that Daniel no longer loves Adidas shoes. We can invalidate that, which is an invalidate date. And this is how we capture the time dimension of changing state. And it allows Graphiti to reason with the new event data that it receives. 

[0:22:55] KB: I love this distinction between semantic memory and episodic memory. When you put those timestamps in place, are those also a timestamp and a link to the episode that resulted in this change? 

[0:23:05] DC: Correct. Yeah. 

[0:23:07] KB: Okay. Okay. Interesting. 

[0:23:08] DC: In the graph, you have an episodic node, then you have entity nodes that are related to that episodic node. And so what you end up with, it's really interesting, you have multiple episodes that link to the same entities. And you can see longitudinally over time, state changes. And very importantly, Graphiti allows your agent then to reason with state changes, "Oh, Daniel used to love Adidas shoes, but he's back again and he wants to purchase some shoes. I can still see that he is a road runner, pronates and has wide feet. But the fact that he loves Adidas shoes is now invalid, so I won't recommend those to him. And he just mentioned that he thinks he wants to try out the Puma shoes." We can add another entity to the graph that there's potentially a preference for Puma shoes. 

[0:24:06] KB: Yeah, yeah, yeah, yeah. Really interesting. Okay. Next question related to this is how do you conceptualize multiplayer or things like that, right? In this case, let's go back and just say you haven't sent the nastygram yet. We know Daniel said Adidas is great. I love Adidas. Maybe Kevin said Adidas is terrible. Is that tied to each individual? How do you consolidate knowledge? And is there a way in which, if you have specific user graphs, can they be consolidated or shared in some way across a set of users or an organization? How would you deal with those layers? 

[0:24:40] DC: Yeah. In Graphiti you would, in the data that you provide, ensure that Graphiti understood that there was a different speaker or the data was related to a different user. You could do that in the JSON, if you're parsing a JSON event in, or you can do it in unstructured text in a transcription format. 

Maybe this is a good segue to speak to what Zep is versus Graphiti and how things work in Zep. Graphiti is a framework for implementing temporal knowledge graphs. It can be used as a memory layer, but it's a very generalized framework. Zep has first-class support for projects, users, sessions or chat history threads, role-based access control, data governance, and privacy functionality that's all layered on top of Graphiti, plus SDKs for Python, TypeScript, Go. It's an end-to-end memory service that is framework agnostic. And we have the ability to implement Zep within LangGraph agents, within AutoGen, or without any agent framework. 

Here in Zep, you can have user-based knowledge graphs. And by default, if you have single user agent interactions, it would go into a user knowledge graph, and it's well contained and ensures that user data is managed appropriately from a privacy and security perspective. But Zep also has the concept of group graphs that can be used for multiplayer scenarios. For example, you can stream Slack messages into a group graph and query the group graph independent of users, specific users. Get results back from multiple users. And so there's a lot of flexibility there. My answer is yes. 

[0:26:52] KB: Let's maybe talk a little bit about then what this looks like from a developer standpoint. Because I think one of the things that was interesting to me was this idea about, yeah, automatically inferring, changing things. What am I sending to Graphiti or to Zep? What do I get back? How do I query? What does that actually look like for a software developer? 

[0:27:13] DC: Yeah. In Graphiti - and I'll speak to Graphiti because I love the fact that it's open source. It's easy for developers to implement as well. Let's talk about Graphiti quickly. Any unstructured text, JSON, or even transcriptions, you can and to Graphiti. It's a very simple API. And you can create graphs on the fly. You can have multiple graphs in Graphiti. You can approximate what we do with Zep in terms of putting a firewall between users, namespacing your user graphs. 

And what it looks like on the query end is - let me speak to a design decision that we took. Developers don't have to learn Cypher or some other graph query language to query Graphiti. There are two reasons why we did this. One, we think that we can provide developers with a very powerful search framework without requiring immersion in a language like Cypher, which is a common language for knowledge graphs originally developed by Neo4j. 

And the other design decision that we made is we're not going to have an agent or LLM in the query path, because that's super slow. GraphRAG can take tens of seconds to get a response back. And GraphRAG is that static RAG framework that is built around a knowledge graph and was developed by Microsoft Research. And it is slow because there are all sorts of things that it's doing in the query path with LLMs. And that doesn't really scale in production. It means that you can't use something like GraphRAG for voice agents. And those are becoming more and or common. We don't want to be sitting at a keyboard and typing. We want to talk to a device, for example. 

And so Graphiti offers a number of different ways to retrieve data and we do so in a way that's super scalable. The entry point into a graph is typically through an index, not through a graph search query where you have to then traverse the graph to find things that you're looking for. And these indexes are semantic and full text indexes. 

And so what we've done in Graphiti is every time you add new data to the graph, I mentioned you have entities and you have edges between those entities. We place a fact onto the entity, "Daniel loves Adidas shoes," and that gets indexed both semantically and from a full text perspective with BM25 indexing. We also generate summaries for those graphs which is for the entities in the graph. And those summaries basically are summaries of all the relationships that a particular entity has. Daniel loves Adidas shoes, Daniel pronates, Daniel has wide feet, Daniel is a road runner. And we index the summary. And so that allows you then to search edges and nodes semantically and with full text, or both. And your entry into the graph is in near constant time as a consequence. And so you're querying subgraphs. You're querying subgraphs rather than the entire graph. 

[0:30:52] KB: Got it. Okay. Let me make sure I understand this. First, data entry. You throw over your stream of messages, documents, whatever. This doesn't have to be fast. This is when you use an LLM to do extraction or do whatever other things that you're doing there. 

[0:31:06] DC: Exactly. It's done asynchronously. Yes. 

[0:31:09] KB: Done asynchronously. You can do a lot of essentially pre-compute of all these different things. You pre-compute a set of summaries, you put things into an index. And the index also then points to the nodes. Then query time, you go straight to this fast index, it loads up for you a summary of a set of things that you can throw into your agent's context right away, and a sub-graph if you want to do more searching that's there for you. 

[0:31:34] DC: Yep. Well, it's not just the summaries, it's also the facts on the edge, which is pretty well defined. And then you can also do graph search based on that. Things like retrieve the nearest nodes to the node that was a hit. Using something called breadth first search, which allows you then to get a more comprehensive view of the relationship that you have retrieved from the graph. You can see adjacent entities and their relationships. 

[0:32:10] KB: And that's done explicitly? I will get back and I can then do this? Or is this done as a part of the internal first query? 

[0:32:15] DC: Graphiti has a number of different recipes that have been developed that are kind of like best practice pipelines. But you can also create your own pipelines. And the types of things that you can do are pull together BM25 and vector search, and use something like reciprocal rank fusion to join the two search results, and then use an LLM re-ranker to re-rank the results. Or you can use - we have graph-based re-rankers as well. You can, for example, re -rank results from distance from a centroid node. If you want to get Daniel specific facts back, you can say re-rank all the facts by semantic - well, looking at the semantic search results, re-rank by distance from the Daniel node. And so these are all baked in as simple recipes in Graphiti. And they're super powerful. In Zep's implementation of Graphiti, where we run our own embedding and re-rank our services, our own GPUs, you can search a graph in under 200 milliseconds, and the most expensive pipelines have a P95 of 300 milliseconds. This is super cool. 

[0:33:35] KB: I'm doing some pieces of this type of work right now, and I was listening to what you're saying, I'm like, "Holy smokes. I got to get my team on Graphiti. This is cool." 

[0:33:43] DC: Yeah. We actually benchmark. We published a paper last month that describes how Zep works and deep-dive into Graphiti. It's available on Archive and I think we can probably link to it in the notes of this podcast. And we benchmarked Zep and Zep's implementation of Graphiti against the state of the art, the prior state of the art in-memory, which is MemGPT. We actually found that the MemGPT evaluation suite was too trivial. 

And so we used that to compare Zep to MemGPT, and ZEP was a far stronger contender there. But we also selected a far more comprehensive and larger evaluation benchmark called LongMemEval. And as the name implies, it's for long memories and evaluating performance of retrieval across long memories, recall across long memories. And by long, I actually refer to 100-plus thousand token memory filling up a context window. And Zep outperformed the baseline of putting the entire conversation and business data into the context for both GPT-4o and GPT-4o mini. Outperformed GPT-4o by 18% and mini by a larger margin across a battery of 20 or so evaluations. And some evaluations, where temporal understanding was required, it outperformed recall over the entire context window by almost 100%. 

[0:35:38] KB: This makes total sense because you can essentially pre-compute the relevance and you're only feeding in what is actually going to be really useful and relevant to the LLM, so it doesn't get distracted with all the other noise and all these different - no, it makes perfect sense. 

[0:35:53] DC: Yeah. The needle in the haystack problem has not yet been solved beyond - look, I don't like betting against models and the future of LLMs or any other architecture, whether it's diffusion or some other unknown architecture. However, today, the recall problem is not solved. And even putting aside improved recall, what we found is that, with stronger models, Zep performed better. Having your agent running with GPT-4o versus GPT-4o mini improved its understanding of the context that ZEP provided. There's the promise of being able to offer far denser sets of information to the LLM and for it to then be able to make more cogent decisions based on that data. 

[0:36:41] KB: Absolutely. And you can load up only the relevant parts of the context for it. 

[0:36:45] DC: Exactly. 

[0:36:46] KB: I agree about not betting against models. But, I mean, if you think about the way that we work as humans, we get distracted when you put too much in front of us as well. I don't expect the models to be that much better at it. 

[0:36:58] DC: Exactly. In fact, there's mental health conditions where if you have a storm of memories come back and other things that flood your consciousness, we also have issues. The other dimensions that are very important to recall are latency and cost. And so as you described, if you're only pulling out the most important and relevant information that the agent needs at that moment, you can reduce cost and latency dramatically. One of the other dimensions that we benchmarked in the paper were latency, reducing latency by 90% and reducing token cost by 98%. 

[0:37:44] KB: This makes a ton of sense. All this makes sense. I love it. I actually am really excited learning about this. What are the big problems you still see in agentic memory? What's not solved yet? And what are you kind of working on looking forward? 

[0:37:58] DC: Yeah. We're very focused on production and not research. And so when you're building systems for research, you don't have to worry about real-world consequences, putting services in the wild. And so what the benchmarks demonstrated to us is that there's still work that we need to do on improving Graphiti and Zep on a number of different dimensions. There's ongoing work there. 

We also recognize that in large enterprises, in particular, pushing data into memory is fraught. And so in production, you have all sorts of other things you need to wrap around a service, like a memory service, that touch privacy, that touch data governance, that touch security and touch many different parts of the business that you need to get buy-in from. And so the real world consequences of building comprehensive memory for enterprise agents aren't just in the technology itself, but in being able to demonstrate that the technology works along many, many different dimensions. Not just that it creates the right memories, but that you ensure that the data gets removed when it's supposed to get removed, that it's secure, that it is safe, et cetera. 

[0:39:35] KB: In a lot of ways, what I'm hearing is your focus right now is navigating the gap between cool prototypes and production. 

[0:39:42] DC: Exactly. Yeah. And so we have a lot of earlier stage pre-IPO companies as customers. Many of them have got compliance requirements as well. We're SOC 2 type 2 certified. We're working on HIPAA certification as well. We've seen a lot of interest from the kid domain. There is so much opportunity there for automation. But in the enterprise, we're still seeing folks moving from that prototype phase to stack selection, so building out reference stacks that can be rolled out globally across these large enterprises. It's pretty early for large enterprises in agentic adoption. 

[0:40:29] KB: Yes. This highlights one of the reasons. It takes a long time for all these amazing things that we're seeing with LLMs to trickle out into impacting the whole world. Now, with those customers that you have right now, you probably have an inside view on what's happening in the agent world. Can you share anything about what are the domains in which we're really seeing cutting-edge agentic applications?

[0:40:53] DC: Yes. Something that I'm finding most exciting is a rise of ambient agents. We've spoken a lot about human in the loop. Daniel wants to buy some shoes, et cetera. Ambient agents are super cool because what they're doing is they're just monitoring their environment and taking actions based on these changes in their environments. You can think about an agent that is monitoring telemetry from a car and being able to understand that telemetry and provide active support to the driver around what they should do when something goes wrong. 

You can think about, in a household environment, home automation; learning intelligently from occupation sensors, learning intelligently from your personal calendars as to when you're home and when you're not, coupled with presence sensors for pets, being able to turn lighting on and off, adjusting the AC, in the future making food on time, responding to events like presence when there isn't supposed to be presence, et cetera. Ambient agents offer these amazing opportunities for human agency because they're taking over a bunch of work that we needed to previously and can extend our consciousness as well, understanding what's happening in our environment. 

But also one of the biggest concerns when it comes to what the future holds from an AGI perspective. A lot of the time in the popular media and when people talk about AGI, they're actually talking about ambient agents and agents taking actions not necessarily without human oversight. That's a lot of fun and very interesting, and we've been talking to actually some larger companies thinking about ambient agents. 

Then we've also seen a lot of development, I mentioned, in healthcare around really automating things that were very, very expensive, processes that were very expensive to run, things like insurance claims, understanding coding of insurance claims. 

[0:43:47] KB: I don't know if humans understand that part, right? Like much less. 

[0:43:50] DC: In a prior life, I investigated medical billing, and it's an incredible environment, such an adversarial relationship between healthcare providers and insurance companies that they actually have teams in competition with each other. But that's a different discussion. We're seeing a lot of use across B2B and B2C of agents and everything from mental health applications for consumers, e-commerce applications. You name it. In the B2B world, there are some really exciting companies that we've been working with that are building. This, again, is one of the frightening things. 

But AI, building analyst tools to be able to comb through very large amounts of data for Fortune 500s and the national security industry. So basically Analyst Worksuites that operate autonomously to develop reports and to take actions based on vast amounts of both open source and proprietary data. That's really interesting as well.

[0:45:03] KB: Yes. Wow. All right. We're coming close to the end of our time here today. Is there anything we haven't talked about that you think would be important to touch on before we wrap up? 

[0:45:15] DC: I want to circle back on is the reframing of what agents are, getting a little more precise about that, and in particular talking about memory again. We have a lot of folks come to us thinking in terms of RAG and static document corpuses. I think that's a solved problem already, and there is so much thinking that needs to be put into a post-RAG world where we're dealing with more dynamic data. There are both technological challenges there, as well as human challenges. Some of which are mentioned in terms of compliance but others in terms of technology. There are areas that we're still pretty uncertain about in terms of how we offer really cutting-edge capabilities in production environments in a post-RAG world. 

As humans, each of us perceives the world very differently, so our semantic understanding of the world is very different. That's why having different people's opinions is so helpful in decision making because we see the world with different perspectives. How are we going to help agents understand the world if we ourselves have different perspectives on what reality is? That gets into some of the philosophy of reality, and that's an area that I'm very excited to see people explore. I think it is incredibly necessary, and it ties into AI safety. It ties into the impact on labor, et cetera. 

[0:47:13] KB: Yes. There's something really interesting there, and it reminds me of work. I was pushing forward a previous job that didn't end up getting there. But when you're driving these, I was going to call them facts, but they're not, these knowledge entities. When we start talking about trying to perceive reality, we need to move from a single user to like a multi-user view. You almost want like a Bayesian approach to like we have these different supporting factors that lead us to an 80% belief this is likely, and here's the things that might validate or invalidate our priors on this, kind of having essentially probabilities and support structures associated with our facts. 

[0:47:52] DC: Yes. I actually like framing it as a Bayesian problem. If one looks at the Graphiti architecture, it is conceptually doing something similar to that when it does reconciliation, where it is looking at priors to understand how to form new memories or facts. But a formal Bayesian layer over something like Graphiti would be very interesting, so adding some sort of Bayesian reasoning. Probability is a very challenging thing for humans to conceive. 

[0:48:35] KB: Absolutely. 

[0:48:36] DC: As a consequence, LLMs are incapable of perceiving probability. But there is an architectural aspect of how LLMs work that can help us understand why an LLM produced what it produced. That is looking at things like logits, the probability of the tokens that were produced. There's opportunity there. It would be interesting to explore. 

[0:49:09] KB: Yes. There's kind of two sides there, right? There's the what's the probability on knowledge based on multiple views of a thing coming in, multiple interpretations. Then there's even just what's the probability on the evidence that we've extracted because the LLM itself is probabilistic. 

[0:49:29] DC: Exactly, yes. Honestly, it's interesting. We see that with customers. This looks wrong, and you sit down with them, and you say, "But this is what it says. This is the input data, and this is what is built." Oh, yes. Yes. No, I see it now. So then how do we get it into the structure that you'd like? That's why we've done things like built custom entity types, et cetera because we perceive other worlds so differently. 

[0:49:57] KB: There's a piece of this, too, that like one of the core ways I think about LLM applications because they're probabilistic, because they have all these things is like if you need reliability or even if you don't, it's useful to think of them in terms of like there's an inference step, and then there's a validation step. Then you go to another inference or something like that, and that validation might be a human in the loop. It might be some sort of formal checker. It might be a second LLM or some other thing. But what does that look like in a probabilistic world? How do you insert these validation steps as you build? 

[0:50:29] DC: Graphiti does implement reflection on things like entities that's extracted, conflicts that's identified, et cetera. We've done this because we started building Graphiti before reasoning models. We actually don't think at the stage of reasoning models are necessary for Graphiti. It's costly, slow. Again, we look at how we scale things in production and cost. I think folks in your audience who have worked with Knowledge Graphs and LLMs and have worked maybe with GraphRAG will know that it's very costly to produce graphs of any size. That is something that we're working on because we want to be able to commercialize this at scale, not just for large enterprises but also startups. There's a lot of work that goes on there. 

[0:51:20] KB: In terms of the inference that you're doing from you have a stream coming in, you're inferring out these knowledge graphs of different sorts, can that be done on a range of models? Do you need the top cutting-edge models to get good results? What does that look like? 

[0:51:34] DC: Yes. We use frontier models for our work. We've not fine-tuned any models. We might do so if we have some very domain-specific use cases. Frontier models have been able to perform well enough for our use case. We do run our own inference infrastructure for some use cases, not for graph building because of just the sheer scale of it. We do use Microsoft and other inference services for our use cases. You can use smaller models with Graphiti, depending on your domain and the complexity of your data. You could run, for example, GPT-4o mini, or you could use anthropic Haiku or Llama 3.1 70B or 3.2 70B, et cetera. If you have a pretty constrained domain, you may even get away with using much smaller models. I find that the prospect of using smaller models is very exciting. I actually think that we're going to start seeing more complex product architectures from inference providers where what we perceive of as a model that we're using actually isn't just a single model. It is a set of different models combined together at different layers of what we would traditionally view as a model and solving different problems. 

[0:53:16] KB: Absolutely. All right, awesome. Well, this has been super fun. Thank you, Daniel, for joining me today. 

[0:53:21] DC: Thank you, Kevin. It was a lot of fun. 

[0:53:23] KB: And we'll call that a wrap. 

[END]