EPISODE 1923

[INTRODUCTION]

[0:00:00] ANNOUNCER: AI agents are increasingly capable of reasoning and performing autonomous work over long periods. However, as agents take on more complex, longer-horizon tasks, keeping them supplied with the right information becomes the core engineering challenge. The industry is moving away from preloading context upfront toward a model where agents dynamically navigate and retrieve the data they need when they need it. 

Redis is approaching context management using a context engine, which is an architecture built around four pillars; on-demand context retrieval, data that is always current, fast retrieval, and a memory layer that improves over time. In practice, this means building materialized views of data with a semantic layer on top rather than giving agents direct access to production databases. A memory system sits alongside this, extracting and compacting information asynchronously as the agent works. 

Simba Khadder leads AI strategy at Redis, and he previously co-founded the feature store platform Featureform, which was acquired by Redis in 2025. In this episode, Simba joins Kevin Ball to discuss why context has become the defining challenge in agentic AI. How context engines differ from traditional RAG architectures, how materialized views underpin reliable agent data pipelines, how memory systems can improve through async extraction and compaction, and how engineering teams need to adapt their practices as AI-driven development accelerates. 

Kevin Ball, or Kball, is the vice president of engineering at Mento and an independent coach for engineers and engineering leaders. He co-founded and served as CTO for two companies, founded the San Diego JavaScript meetup and organizes the AI In Action discussion group through Latent. Space. Check out the show notes to follow Kball on Twitter or LinkedIn, or visit his website, kball.llc.

[INTERVIEW]

[0:02:07] KB: Simba, welcome to the show. 

[0:02:09] SK: Thanks for having me. 

[0:02:10] KB: Yeah. Well, let's get started learning a little bit about you and your background, and then we can move into Redis and talking about that. So, how do you explain yourself to folks? 

[0:02:19] SK: When I grow up, I want to surf all day. But for today, I'm in AI. 

[0:02:23] KB: Okay. I mean, I could resonate with that. Surfing is a blast. 

[0:02:26] SK: Yeah. I started as a software engineer, technical. When I started my career, I was at Google, a software engineer at [inaudible 0:02:33]. I solved a lot of really fun technical problems. Worked with some super smart people, but I was always kind of itching to go and learn on a slope that I felt like I had been learning on before. So, I left Google. I started my first company. Kind of did well there. Started a company after, which was Featureform, which then was acquired by Redis. And I've always loved Redis as a product. So it's kind of awesome to become a huge part of the AI strategy here. But yeah, that's kind of the short. 

[0:03:01] KB: All right. Well, and can you quickly describe Featureform and what it is so that we have that as well as background? 

[0:03:07] SK: Yeah. Featureform, we were a feature store company. VC-backed. Kind of that style of company. The problem that we solved was around anyone building models and getting them to production. Every model - when you open Spotify, you get all these personalized recommendations. You could imagine that every time you open the app that it's looking up your favorite song, favorite category, favorite genres, all these concepts about you, these signals. And you can imagine that there's this whole team at Spotify whose whole job it is to optimize these signals, come up with new ones, etc. Those signals are called features. 

And what Featureform was and is, the whole team is here, and we're growing it. And hiring, by the way. But what feature form does is it enables a data scientist to define their features as code. The name even, I want Terraform for features. And it deploys those to production. It keeps them up to date. It runs a compute on your compute, whether it's Spark, or Snowflake, or whatever. It maintains like a view, like a materialized view in Redis so that you can get your features up to date, but you have your training data as well. What's interesting, and I'm sure we'll get into it, is if you change feature for context variable, it looks not too dissimilar, excluding the training part for what people are building today for agents. 

[0:04:20] KB: Yeah. So that actually is a nice transition into kind of what I think a lot of the meat of this, right? Five years ago, you didn't have to worry about what was getting fed to your model or how it was representing things or any of this unless you were deep down in data science or integrating it and you were doing that. Nowadays, we're all having to learn about machine learning. How do you put things in there? What does context even mean? All these different things. 

I'd be curious to kind of get your take on the environment that we're in right now. As devs, we're at the bleeding edge of it. So maybe we start there of how this is changing the world of software development. And then we can kind of dive into how those feature pieces are becoming context and what that looks like. 

[0:05:00] SK: Yeah. The thing that's happening. There's a few things happening in parallel, and they're all really interesting. One is it's even funny they say like us as devs, because I almost feel like everyone's a dev now in an interesting way. Our jobs are changing. Not that like if you need to just build a website an app, my uncle could go do that or my aunt could go put something together. Things have changed. Now, could they build a database? No. Anyway, just kind of an aside. 

The main thing that we're seeing in terms of the overarching landscape is a couple years ago, even last year, you could trust an agent unsupervised for maybe a few minutes. And that was cool. I mean, that was huge change from before. Imagine 5 minutes of an agent running by itself to software task. Now, I mean, I think the actual metric is about an hour. You can trust an agent to unsupervised solve a software task that takes up to an hour to complete, which is roughly around complexity. 

And the interesting thing is, is Anthropic says that that number is going to double every 6 months. If it's an hour now and we do a repeat, in a year it's going to be 4 hours of unsupervised agents working. And so what's changed with that is if you do RAG, traditional RAG, naive RAG, what you're gonna find is you're not going to be able to get enough context to feed it for 4 hours. 

[0:06:24] KB: Right. The limit is less the model's ability to pay attention and more like how do you keep relevant tasks in front of it. 

[0:06:31] SK: Exactly. And my take and Redis's take has been that, for many use cases, the agent will give the agent access to the context. Let it find what it needs and use it. Don't try to like put it all up front. It's not just a context window problem, but almost like tool-based context retrieval. And that's really the thing that's happening now. And if you your agents don't look like that and your AI applications don't look like that, then you are not taking advantage of this reasoning wave that is happening. 

And so your really naive RAG app is pretty much capped out. It's not going to get better. Because it's not like the agents are getting smarter. They have more inherent facts, I guess. Maybe a little bit. But what they're really good at is they're able to sit coherent for longer. They're able to reason about more complex tasks that have a longer time horizon and be able to solve them end-to-end. That's the thing that's changed. It's almost like the RL. The post-training has gotten better, but the pre-training is about the same. That's I think the fundamental thing that's changing. And because of that, all that matters is context. 

[0:07:37] KB: Yeah. I think you're describing something that I've also seen, where we've shifted from this model of trying to pre-compute what are all the things that we can then do a single-step pattern match. To instead saying, "Hey, let's just give this thing - this thing can reason now. Give it the ability to pull what it needs, explore." And we're seeing this in the ways that things are being managed. It's no longer, as you say, naive RAG, where you're like do a search up front, dump it in the context window, run. It's, "Here's a search tool. Call it when you need it. Get some things back." 

We're even seeing that in terms of how you manage local stuff. Instead of like, "Here's all the things I want you to do," you're like, "Here's a set of skills that you can progressively disclose context when you need it, not previously." Yeah, that totally tracks. What I'd be interested is where's the limitation area there? What is it that you can do at Redis or in another environment to facilitate agent's ability to do this? 

[0:08:30] SK: There's a pattern I'm seeing emerge. And the term that we use is a context engine. And what a context engine is, it has, I would say, four pillars. One, an argument that's the most important is that agents should be able to navigate and retrieve context that they need. And this doesn't have to, by the way, be tool-based. Well, it's always tool-based in some way. But it can be a CLI. It can be MCP. 

Actually, we have opinions. And with our newer products, you typically do both MCP and CLI. But that's almost an implementation detail of a context engine. That's one, is that you need to be able to navigate and find data. Two is that your data always has to be up to date. Three is that the data should be fast. For many use cases, speed is really important to have the UX and everything feel natural. 

This is more obvious if you think of an extreme where every single tool call is a whole Spark query, which takes 3 minutes. It just fundamentally doesn't feel like an agent anymore. And then four is that the context should get better with time. This means a lot of things. It could mean personalization. It could mean that it's keeping track of decisions and errors it's made and keeping track of those things so it doesn't make them again. There's a lot in there. 

But in the end, what you end up with is I have a surface of context that I can go navigate and look through. I can retrieve the context that I need. But I will always feel like that context is either the source of truth or that it's a view. It's always up to date. And then four, that it's always just going to get better. This is the moat. 

If the reasoning is solved, not solved, but it's constantly getting better and it's doubling every year, the moat is who can build this context moat that really separates them out from everyone else. Arguably, this is literally the difference between Anthropic, and OpenAI, and everyone else is that, as they get better, they get more data, and their contexts get better and they're able to build better models. 

[0:10:26] KB: So, let's maybe break down those steps. Should we use the example? I think probably most of our audience is familiar with a coding tool of some sort. So we can use a coding tool and work through what does that actually mean in that context. Imagine we're building the Redis-based version of Claude Code or something like that. First step, agent should be able to navigate and retrieve the context that they need. What does that look like in a code setting? 

[0:10:48] SK: I think code - and we can go through code. I think the thing that makes code a little unique is that it's almost always going to just be on your local file system. The retrieval step and the sync step and a lot of those things are solved just using git. I think where maybe a good one might be like a customer support agent. 

With the customer support agent, what you'll find is, one, the information you need. Someone asked why is my order late? I use this example a lot. Because if you think of RAG and how you would build a RAG app, you would take a knowledge base, you would chunk it up, put embeddings. Someone asks, "Why is my order late?" And you would say, "Here are common reasons for delays," which is terrible. 

[0:11:25] KB: Exactly. Generic search against it. Yeah, totally. 

[0:11:29] SK: And that's what most people do. And what you end up with, the last iteration of AI apps were kind of glorified summarizers. We're pretty much just showing off what LLMs can do. The thing is is that most people kind of know what LLMs can do now. I mean, not everywhere, but for a lot of tech-forward places like San Francisco, New York, people already get it. 

Now it's like, okay, when someone asks, "Why is my order late?" I need to go get the order. I might need to go get information about the order, the deliver. I need to look up our policies. All those things are going to be in different places. Firstly, I need to give the agent tools to be able to access all of those things. Now what you'll find is there's a major question that pops up there is like how do we - do we just let our agent have access to our Postgres DB directly and just run queries? What do we do? 

[0:12:20] KB: What could possibly go wrong? 

[0:12:21] SK: Yeah, exactly. It's like, "Huh. Why is that prod database? Where is it?" It's like, "Did you delete it?" It's like, "You're absolutely right." Don't do that. What we see more often, and this is where the context engine architecture comes up, is that people are going are building materialized views of data, where they might have all these systems of record. And they don't want to deal with scale and all the other things that come up with a ton of agents hitting it. So it's like let's create a materialized view. And then on top of that materialized view, let's create almost like a retriever service. A set of tools that can go and access these things. 

At Redis, things that we are doing are more around putting almost like a semantic layer on top of Redis. And semantic layers are not new. But we've never seen a semantic layer on top of something like Redis because it would make zero sense before agents. It would be a very strange thing to do. 

Anyway, one piece is having almost an ETL synchronization layer. Redis has a product called RDI, which is pretty much an ETL. It builds views and maintains those views as information changes. Now I have a materialized view of context. I control what's in that context. I accept rules around how that context is accessed. I don't have to worry about scaling a ton of random systems for agent use. And I own it. It's my thing. It's not like, "Oh, this part's in Salesforce. And this part's here, and this part's here." 

In the customer support use case, I might have some Postgres databases. I might have some different APIs, etc. I build a materialized view. I describe what is this information. Almost like a semantic layer on top, which we can compile into a set of tools, an MCP endpoint, or a CLI. Have the agent connect to that. And now I have a fundamental ETL built for context with a retriever step. 

[0:14:01] KB: A couple things I'd be interested to know on that. One, are you building a single set of materialized views that you're exposing to all these different agents? Or are you able to customize that down to this agent gets a materialized view with literally what the current customer could see or something? How fine-grained can you get this thing? 

[0:14:20] SK: Very fine-grained. You can use ACL's at the row layer. You can do RBAC at the agent layer, so the different agents can see different things. You can mix and match different forms. Yeah, for sure, you definitely have to make sure that you nail it down. But it's a lot easier to nail down when it's in something that you built for context. The issue is if you give access to Postgress, you then have to try to nail down Postgres to work in a generic way. 

And the thing is, is that people don't typically build Postgres databases of the idea of like any random person can query this. And so they're not well set up for that. And so then it's like do you try to set that up or do you build a whole different API suite for it? And then how do we deal with all the new indices that we're going to want to create for all the kind of unusual search patterns that we expect agents to have? That's where we see the materialized view use case come up. 

[0:15:08] KB: Yeah, that makes a lot of sense. I'd be interested, you mentioned the semantic layer on top of it. And one of the things that I think has been fascinating to see with LLMs is how much semantics matter, right? The more you can shape your data into something that linguistically makes sense to the agent or is able to live in that part of the LLM's like training data that is well covered, the better it's going to be able to use it. What does that look like for Redis? And how do you see it used? 

[0:15:38] SK: Firstly on the point of this thing it works really well, I agree. Actually, you said for agents to understand. I would go and say for humans to understand, too. A lot of times we define tables in ways that are optimized for compute. And really what we need here is things that are optimized for the ability to grok it both as a human or as an agent. 

[0:16:02] KB: What? You mean you don't naturally think in perfectly denormalized tables? 

[0:16:07] SK: I would love to, but I unfortunately do not. And so I'm happy I'm not so deep in that world anymore. But yeah, that's a big thing is that you need to build. The other interesting thing is actually something - a lot of people, when they think of Redis, they think of speed. And they should. I mean, Redis is the fastest database. But from my perspective, the thing that always made Redis kind of special to me was these data structures that we have. They're kind of unique. It doesn't really look like any other database. There's not really anything out there that really looks like Redis, unless it's essentially a fork, or a clone, or - 

[0:16:42] KB: Let's actually dive into that. I have more recently gotten into using Redis for a few different things. And I'm using Redis Streams and other stuff. But when I first encountered Redis, I just thought of as a key value store. That was it. It was just keys and values and nothing more. But there's a lot more. 

[0:16:58] SK: There's so much more. We have even statistical data structures, like HyperLogLog. We have sets, hashes. We have indices. I mean, vectors. We support vectors and vector search. But the structure is much more around data structures. And it's not trying to be - it doesn't look like anything else. Actually, Redis is almost its own brand and category. 

The thing that's interesting though is the reason why people loved Redis is because it's like kind of fit very naturally to a lot of problems and how people structure things in their heads, "Oh, this is a set of things." "Oh, this is a hash." "This is a JSON object." This is searchable text." It kind of fits really nicely in that way. It's not like we have columns, and primary keys and foreign keys, and cascading deletes, and trying to kind of reason about all that stuff. 

The thing is that it's really nice for humans. And it turns out it's also really nice for agents. The only thing that was missing was the semantic layer. The semantic layer was essentially the application code before. The application that was writing or knew what was there kind of just knew what it was there. And the code was almost like self-documenting what the data was. But now, we need an agent to be able to look at something like Redis and understand what is this and what's in there. 

And the thing is, is some people use Redis as their primary DB. But more often, Redis is used either as a cache or, very often, as a materialized view. In which case the actual data, the system of record, is actually somewhere else. And Redis almost as like the delivery mechanism. And so in that way, it really makes sense to, on the context delivery mechanism, put a semantic layer on top. 

[0:18:36] KB: Yeah, that's interesting. And I think it points to another thing, which is it allows you to maybe not have to separately define to the agent what can you read or not. Because if the database itself is semantic, you can say, "Hey, go look at my database." You can figure out what to do with it.

[0:18:53] SK: And that's kind of what you need. Otherwise, you end up building a million tools. Those tools keep changing. If MCP changes dramatically and you're starting to use tool search on top of tools, then actually now you need to completely change how your descriptions work and how things are defined. But most MCP servers really boil down to data delivery or just context servers. And it makes a lot more sense from our perspective to just define this as a schema. 

A lot of the concepts that went into what we're thinking about. There's really two concepts that we merge together. One is thinking almost like ORM. Less the writing part, but more the mapping. It's almost like an agent would work really well with that sort of thing. The same reason humans use ORMs and not write SQL, because it's much easier to reason about. And then two is GraphQL. GraphQL is interesting because it didn't really take off the way some people thought it would. It's obviously still very popular and all, but it hasn't like killed REST. Other ways of building APIs. 

And the reason why I think is because having used GraphQL, there's a lot of axes of freedom, and what you have to write for and how you can query. And in my opinion, I was like, "I just when things have input and output." It's really simple. I know that this input creates this output. I can write really nice tests. I'm not going to miss anything. Performance is easy to control. But agents would actually love something that looks like GraphQL because they actually can take advantage of all that gray space to get exactly what they need and solve for a variety of different problems. The problem is, is that you don't really want to give them an actual GraphQL server because they don't write good GraphQL queries, one. And a lot of them aren't built to handle the scale that you would expect of an agent. 

But if you take those concepts and put them together, one, you'll be like, "Okay, we should cache this data, build views of data. We should build a semantic layer. We should enable this layer to hit the materialized view. But if needed, go hit the systems of record." And then you end up with this thing that looks like a context engine minus the memory loop, the loop that makes the context better. But that's kind of how we think about it. 

[0:21:06] KB: I was going to dig into that. How do you make the context better, right? Because so far we've talked about Redis in the form of essentially a materialized view is a good thing, right? It's read only. It's a view that has been transformed in some way or another, but there's no iteration in that. There's no rights necessarily in that model yet. What does that end up looking like here? 

[0:21:28] SK: One is I think a lot of it actually doesn't happen of the agent writing. Some of it will. But I think a lot of it happens async. I'll give you an example. If you're using a lot of agents to write code. I know I do. You've probably landed on something that sort of smells like spec-driven development. Whether it's exactly that or not. The very least you're using markdown files everywhere. You're keeping track of those markdown files. You're editing that way. In many ways, your prompt is almost really just pointing at a file. And one thing that's like annoying is that over time, you have to almost compact that stuff. Make sure it's still accurate. 

[0:22:06] KB: Prune, archive. Yeah. 

[0:22:08] SK: And the same thing you could imagine would be very true for like a customer support bot. It's like, "Oh, I learned this from this one. I learned this here. I learned that I should not go and call this tool. I should call this tool in this sort of situation." Over time, not only will it be too much, it will be conflicting. And so what you need is something that as stuff is happening, it's extracting information and then it's also compacting information. And these are typically LLM-style problems. 

And so in a way, I almost think of memory as like there's one piece which is personalization, which I think is how most people are using it today. But I think where it's going to go is around almost like this unique type of ETL. I joke that every Vector DB is more or less a materialized view. You have raw documents. The chunking and embedding, it's a weird transformation, historically. I mean, it's just a transformation. Everything in the vector DB is projected exactly from your raw data. It's a projection. It's materialized view. 

And I think what will happen is we're going to see the same thing where it's like, "Hey, this is everything that we've talked about and all the decisions that were made, and pretty much raw traces." And we are going to make these projections that can be tuned, right? I don't think that there's one way to do it. I don't believe that you can just say, "Hey, here's the memory server, and it just works for everything." You can tune it to do what you want it to do. And likely what we'll have is like different engines for different use cases. I know people talk a lot about decision traces or context graphs. And there's all these concepts that are really just extraction. There are different types of extractions from raw data that you then fit and feed back to the agent as context. 

[0:23:44] KB: I want to replay a little bit of what I heard, but this is a meaty one. I want to dig into this. First, just what I'm hearing right now is you're saying, "Okay, take these historical traces, the conversations, the things that you technically kind of want to learn from and have this thing adapt from. Store those in some sort of raw layer." This is just history, log data essentially. And then you are going to apply the same tooling concepts that you've been doing here, where you say, "Okay, let's transform that in some way. Let's materialize that." And that is now part of what's feeding this agent." 

[0:24:12] SK: Yes. 

[0:24:12] KB: Okay, two things that I'm curious about. One is the transformation process, the materialization piece. Is that something that itself can be modified or evolved by the agent? For example, agent is trying to answer these questions that says, "Hey, the shape of this materialized view is wrong for the types of questions that I'm asking." Maybe consistently I'm screwing this up and this is caught in an async analysis of like, "Oh, this is a consistent source of errors." Or maybe it's the agent itself being like, "Gosh, I wish I could ask this question." Can you change the way that views are being materialized as a part of your feedback loop? 

[0:24:50] SK: Today we have agent memory server, which is open source. And how it works is you can define different types of - I forget what they call them, but I always think of them as transformations. But they're actually prompts. You're defining strings of what you wanted to accomplish. Or the type of like, "Hey, I want you to keep track of decisions that were made. I want you to think of seasonality concepts. I want you to think of grabbing facts. I want you to be optimized for personalizing for users." There's different types of transformations you can write. And you can write custom ones. 

I do think it's really interesting to start - a funny problem with building agents is a lot of times you try to stack agents on top of each other to solve problems. A good example of this is like LLM as a judge. It's like, "Okay. Well, we have this problem where we can't tell if these are good responses. And there's too many of them for humans to read. So why don't we have an agent go and do it?" But how about if the LLM judge is bad? Could we write an agent on top of the LLM judge to judge the judge so it gets even better? And then you just keep taking the derivative forever and put agents on top. 

I do think that we will have self-healing memory in a way where even the memory engine gets better over time. But I think where we're at today is that you can just control the memory engine. And this is part of building a context engine, which is that you tuning those knobs. But yeah, for sure. I mean, the layers of abstraction are getting higher. We're already seeing - I find myself having to orchestrate - when I'm building, I'll be orchestrating 15 agents. And they'll be running. And then I use Codex today, and they ping. They literally like make a sound, which is kind of annoying because I feel like I'm a waiter. I feel like I'm running from like table to table to go wait on them. 

[0:26:27] KB: Well, we could talk about that because I think there's some meat there, too. The second question related to this is around controllability. Because I think one of my personal frustrations with most of the memory systems that I've used to date is they treat all memories the same. In the sense that like I'll use ChatGPT as an example. ChatGPT, they added memory. I've had many conversations with it. And now I'm asking for a perspective on this thing, and it's referencing some completely unrelated concept that I talked about 3 weeks ago. 

I have a lot of different interests. I would like to scope your memories, please. Don't tell me what you think I wanted because I previously talked about cooking, right? Don't make a cooking metaphor when I'm asking you about code. Just talk about code or vice versa. So I'm kind of curious how you all think about the control knobs for users on memory. 

[0:27:18] SK: Yeah. Actually, my story, which was funny, was I used it when I was planning my wedding. I was using it, as anyone would, for all the random details. How should I do this? And how should I do that? Whatever. And to this day, it still is like, "Since your wedding is coming up -"

[0:27:37] KB: I know, right? And you're like, "No." LLMs are terrible with time. Yeah, they're terrible with time. They can't deal with it. You're like, "I want this to be scoped. Keep it closed." 

[0:27:44] SK: There are a lot of concepts we have and are tunable around memories being able to fade away temporally. If a new memory comes in that contradicts an old memory, then the new one should be used. You also talked a bit about having separate categories and subcategories and almost layering these things up. Some of the stuff exists already in the memory server. Some of the stuff is clearly where we're going. There's always UX things, right? Because if you start adding those layers of like, "Okay, well, there's, let's say, team-wide memory, and then individual memory, and then there's org memory." And it's like, "Okay, well, how do you choose which one goes where?" What happens if the agent remembers something that's really personal and puts at the org level. And now everyone knows that you love Katy Perry? And it's those sorts of things that really make it tricky. These are problems that we have super smart people working on constantly. 

[0:28:34] KB: Well, nobody gets it right. Right? Spotify thinks I love Katy Perry because my 10-year-old does. 

[0:28:39] SK: Exactly. The thing that is funny is like I was in feature stores before. I feel like I was kind of broadly in distributed systems before that. I think I've always found myself gravitated towards problems that just don't have a solution. There is no right answer. And that's fun because you can keep pulling at it and you can keep learning more and you can be creative. And I think that's where we're at with memory is, at Redis, we're being really creative and creating unique solutions for it. 

[0:29:02] KB: On the subject of problems where there's no good solutions, let's go back to your point about orchestrating 15 agents because I feel like this is what every - not everybody, most of us are trying to do is we're orchestrating all these different things and they're going in parallel. And I don't know about you, but it fries me. I get too many of these things going. And then by like 3:00 in the afternoon, I'm like, "All right, I'm done. I have no brain left." 

[0:29:24] SK: What's funny is I love it. It actually really fits into how my brain works. And I see it really frustrate one of my - he's like one of our best engineers. But some of the things that made him one of our best engineers are actually getting - he's learning how to pivot those things into this new style. Because the thing he's good at is he could kind of single-thread a problem really well. You could give the hardest problem in the world to him and he'd be able to like single-thread it. But now it's like, "Yeah, but I still need you to do that sometimes." 

But a lot of what I need to do is actually just do that in eight things at once and jump between them and context switch. And as a founder, that's all I did all day. So, I feel like I've been training myself for this day, not knowing it. But people who have not had similar experiences, it's actually interesting because people who've been PMs, people who've managed engineers, they actually tend to do better with agents in this problem than people who haven't. And it's because, in some ways, there are lots of skills you have to learn to manage engineers, which are really similar here. You have to context switch constantly as much as you'd like. Sometimes you can't sit down and take the keyboard and just go do it. So you have to learn how to help build a system or kind of prompt your way into getting the results you want. Obviously, it's different. But there are skills that are transferable. 

The short is like I think everyone deals with it differently. I think that it's fundamentally changing how we do things even at Redis on the org level of how we build products. And I think it's only going to accelerate. 

[0:30:58] KB: Can I ask you a little bit about those changes? Because, certainly, having conversations - and it's interesting, right? Because I have lots of conversations with people who are pushing the bleeding edge. And then I'll talk with somebody maybe not on a podcast, and they'll be like, "Is that really how people are doing all of it? Am I so behind? Am I there?" I think it's really helpful to have real life examples of how is this changing. What you're doing at Redis? You're not at OpenAI. You're not at one of these bleeding edge, but you are adopting and changing everything. So how is it working? 

[0:31:27] SK: I think there's a lot of aspects to it. One aspect to it - and it's funny because it's actually one of our predictions for the year. And then it just happened naturally internally, and we're we're definitely seeing the results, which is that everyone will become a coder. Everyone can build. When I work with PMs, I don't want to read a PRD. I'm like, "You could take this PRD and turn it into a prototype in the same amount of time. Let's do that." There are times when a PRD makes complete sense. But you can go build a prototype. 

And the other thing is features. It's very easy now. This wasn't a problem as much before, where you can build these - you can get feature happy. You're like, "Oh, I'll just add this. Just one prompt. I'll just add this thing. I'll add this thing." And so taste is really important. What are we trying to do here? What does good feel like? And good is not, "Oh, you have every single knob in the world." Because you can do that now. It's it all works together really nicely. 

And so the people who do really well in it are able to move that fast. Because, fundamentally, you can just move so much faster now. You can think of the user. It could be a literal customer. It could be an internal user. But you're thinking from their perspective. What does good feel like? And you're just constantly like, "Where can I use AI to make this better?" 

A good example of this is actually a story from my wife, where she had all these spreadsheets. She does marketing. She was trying to figure out events to focus on. And before, she would try to scan them or just try to figure out how to use Excel in a way that was good enough to be able to get an answer. And she was like, "Wait, I have Claude." She found and built us actually a website. She doesn't know how to write a line of code. She's never written a line code before. She built an entire website and shared it with the marketing team, where they could click into things, and it had graphics. And I was like, "Wait, you built this?" It blew my mind that that was possible. 

I think everyone will become one. And I think that's one thing we're seeing is that it's not just an edge thing. So that's one aspect, which is that everyone should be using these tools. And the thing is, is actually most of the value is going to come from - the engineers, I think a lot of them get it. Different layers. And there's a lot to learn. But give someone who's always has all these ideas and they've always wanted to build something, or they're like, "Oh man, if only I just had an engineer assigned to me so I could have him build this like one-off thing." Now they can just go do it. And so that's where we're seeing a ton of change. That's one. 

Two, on the end side. I think focusing in terms of - our team, we write a lot of specs. We review those specs. We actually find that architecture both on the software level and actually on every level is the thing that matters the most. If we can define an interface and you can define acceptance criteria for that interface, I'm like 98% confident that the agent will be able to build something that solves for that interface with the acceptance criteria. If it's wrong because it wasn't clear, the interface wasn't good or the acceptance criteria were missing things. In which case, that's a design problem. Anyway, I could go way deeper, but those are some things that are top of mind. 

[0:34:30] KB: Yeah, it tracks with what I'm seeing as well. I think the non-technical example is kind of interesting because it does mean - I think for core product, the temptation when you can build everything faster is to build everything. And as you highlight, that's not necessarily actually a better experience. But when it comes to like tools for yourself, it's phenomenal. 

And to go back to your customer service agent or something like that, maybe you have a real customer service person, but they're building their selves an agent to help them do their work. Kind of curious, are there ways in which the context engine that you want to expose to people should be different for those non-technical use cases? Right? We talked a lot about, "Okay, we're using this. We're building a product. Or we're building a customer service agent that's going to run autonomously." But is there an internal tools context engine? What would that look like? How does that vary? 

[0:35:19] SK: A lot of our internal Slack bots, especially the new ones coming out, are all built on top of context engines. We had one example of a bot, someone - actually, this is a personal bot someone built for themselves. They would ask questions about their calendar, on their Google Cal. And they'd have all these API calls that they had to make. And they'd ask a question, like "Ah, what are my most important meetings this week?" And like, "Oh, when did I meet with this person last?" They'd ask these questions, and it would take 20 minutes because the response time and Google API is not made for the sets of queries you would ask. Then they actually had the agent build an sync thing, so that every time their calendar changes within a few seconds, it just syncs it into Redis. And then they put a contact surface on top, and they were able to connect to it that way. That's one example. 

I think for internal apps, I think we will see something. It's going to look different. Because a lot of what Redis style is today is much more oriented towards people building with some sort of skill in mind beyond a single user. I think that you can use it - and a lot of people are using it in like kind of one-off, "Oh, I have my own Redis database." And that is my - because of the data structures are so nice, because it's so easy to reason about, because the interface, you can just go write commands. You can go write the wire format in English. It makes a lot of things easier and really fast. 

I think people are doing it that way but I do think it will look different. I think the thing that will be really unique and things that we are thinking about is if everything looks like a coding agent. OpenClaw is essentially a coding agent. That was the whole idea, is if we give a WhatsApp interface to a coding agent, it's essentially a generic general-purpose agent. And really then you start thinking, "Okay. Well, what do we need? We need workers, cloud workers. We need a cloud file system. We need the ability to pull data from different places and make that data accessible." There's something there. But I think that It's still early days. I get super excited just thinking about what's possible. 

[0:37:16] KB: Yeah. If we can solve the data sandboxing, data access problem. Because big challenge with OpenClaw is like, "Okay, put this on your box. Go." Now it has access to everything. But an agent that looks like I can write code, I can run code, is a general-purpose agent. It can do anything. And if we solve the data layer around that so that that can be safe and somebody doesn't have to think too much about it, that's beautiful. 

With that concept in mind then, are there particular frameworks for building coding agents that you see going forward. Or if everybody's going to build their own personalized coding agent, what are the tools they're going to use to do that? 

[0:37:51] SK: On the agent framework side, I think the ones that are going to be most successful are the ones that work best with the ecosystem. I think that's really what it comes down to. Because I think what we're finding is that, to make these agents successful, they need to interface with the world. And agent frameworks that make it easier to either extensively do that or have built-in things to do that make them much more powerful. 

For example, this isn't true. I'll use a contrived example to show something that would not succeed. Let's say someone's like, "Ah, MCP sucks. We're just not going to support it. We're going to build our own version of MCP." They might be able to build something that whatever, is more efficient or better in some way, but it doesn't matter. The thing that's unique about MCP and value about MCP is the standardization. Everyone's doing it. 

And so what matters more right now is standards. Because if we can build so fast, the only thing you want to avoid is having these proprietary blocks in front of everything. Make everything speak standards and make it - it doesn't have to be standards. Just make it clear how to contractually touch all the things. And then I can think of and build any agent I want. 

In terms of the frameworks themselves, I think the ones that are most successful, most extensible will be most successful. And we see LangGraph a lot. Obviously, super powerful. Very early player in the space. They have this Deep Agents framework, which I think is really interesting. We see ADK. Obviously, a lot of the cloud providers are releasing their own solutions. And ADK is really good. I like ADK. A lot actually. There's some kind of more off ones, but I like one called , for example. That's another agent framework work. 

But in short, it matters. But I think it matters less. I think as engineers, we get so caught up in our tools. I talk to people with our coding agent setup. They have this crazy setup that they put together. I'm like, "Yeah, man. I just have like a 100 tmux windows. I switch between them in this way. I have the most simple ad hoc setup because it's what works for me." In fact, I didn't use Cursor for way too long because I didn't want to switch an IDE. I was like, "I use Vim." 

[0:39:56] KB: I was so happy when the CLI-based versions caught up. Because, yeah, I was the same way. I'm a longtime Vim user. And Cursor, you had to use it because it was at the bleeding edge. It was the best. And even though it sucked, it was still the best, but it's not the best anymore. And so I'm happy back down in my, also, tmux-enabled whatever. I have a bunch of work trees and a bunch of tmux sessions. I'm like, "Go. 

[0:40:19] SK: Same. And it's funny. Because if you had told me - I also haven't written a line of code in like maybe almost a year now, which is crazy for me to even imagine. But why would I? It's lower leverage. What's funny is I also feel like I've become a better engineer at the same time because I'm no longer caught up. I feel like the skills of an engineer was almost can you Google fool your way into some weird random library to solve this weird random error, or kind of like dig deep through the internals of a codebase and find this random bug. And I almost feel like that's not that important anymore because the agent can go do that. 

The thing the agent's really bad at is like maintaining a consistent architecture that scales well and makes sense at the codebase over time, especially with the backdrop of I can put out 100,000 lines of code in a week by myself. Times that by team size of Redis, and it's actually easy to almost overwhelm the system with productivity. It's not meant to handle that. And so the issues actually move downstream. 

[0:41:18] KB: And this comes back to this question of like how are you all adapting, right? Because, once again, reviewing 100,000 lines of code in a week times and engineers is incredibly hard. So now your reviews maybe have to move upstream to the design or something like that. I don't know. How are you all handling review and the feedback loop there? 

[0:41:39] SK: There's a few pieces of this. One, it depends on the codebase actually. I think different languages and different severities. An internal tool, you can get away with a lot more. And Redis core is different. It's a different beast. That said, the thing that makes - the way I think is, one, we will do design sessions. We do them every day. I don't like standups. And we just do standups async. But the design sessions, I like doing sync, where it's like every day someone comes up, "This is the system I'm building. This is the project I'm working on." They'll show me the interfaces. They'll show me how it's all going to work together. I make sure we really deeply understand what we're going to do and what the acceptance criteria is for all of it. 

If that's good, then I'm like, "Yeah, then just go manage some agents to do it." As long as we all agree on what we're doing and it makes sense, then your job kind of becomes, "Okay, now go manage the agents to go do it." The first piece is the engineering rigor is much higher. Another example of this is - and it used to be where someone would make a PR, and it's not exactly what I want. It's not bad, but it's not what I would have done it, and I don't really like it. And I might even tell them, and they might be like, "Yeah, you're right." It's like, "Do you want me to go rewrite this?" And you're just like, "We have so much to do. This is fine." And that doesn't happen anymore. Now I'm like, "I mean, the code's cheap. Just throw it all away and change the spec and regenerate it, you know?" 

And if I keep it, say that you want the same behavior, but you want it to work this way, and then figure out the path to get there. So, that's one. Two is tests. Behavior tests are like the most important thing. A lot of times when I start, I will start with what is the golden path, both on the error cases and the happy cases, that would show with high certainty that this thing is going to do what it's supposed to do end to end. 

I write a lot of end to end tests, but those are the ones I manually look at. I make sure those are good on every PR. I'm like, "Okay, what do we know works? Did we miss anything?" And then, obviously, I still expect test coverage to be high. I expect a lot of integration tests, a lot of unit tests, a lot of fuzzing on parameters, stuff like that obviously. But I think the flow has changed where I almost would go as far as to say if the behavior tests are correct and they pass, then the code is right. And if you're like, "Well, how about the speed and how about this? It's like, "Well, did you have a behavior test for it?" 

[0:43:49] SK: Yeah. If those are important, they should be captured in the behavioral tests, right? Yeah. 

[0:43:53] SK: They should be captured. Because then it's doing exactly what's designed to do. So all that matters is the behavior tests. 

[0:44:00] KB: I'm with you, but I'm going to bring back to a slightly different thing, which you talked about the really challenging thing, the really high engineering thing here is how do you keep a system that is architecturally coherent and continue to be maintainable and evolvable over time. I have seen agents build things that completely match my behavioral tests and yet create all sorts of entangling of concerns that are going to make it very hard to do things. How do you square that piece of the challenge of the codebase? 

[0:44:30] SK: Yeah. So it's almost like there's three steps. There's step one, which is the architecture review. And we do that synchronously. The other thing that really benefits, which a lot of people don't talk about, is really good engineers who embrace this fully are ridiculously productive right now. 

But there are a lot of people who are trying. I see them. They're embracing it, but they're not able to hit the same layer. I didn't even realize until I saw how big the separation can get. A lot of the design sessions are, one, to make sure that everyone's looking at the architecture makes sense that we didn't miss anything. Two is it kind of works as a knowledge osmosis, which is really important. 

But anyway, the first piece is architecture. So I'm assuming by the time we got to the behavior test, that we've already decided on the architecture. And I'm expecting to see the same interfaces that we talked about and that the spec, which we have written and looked at, which is by the way in plain English and not super long. It's long enough, but it's not. A lot of people end up implementing the thing in the spec, which is not the point. The spec should be readable by a human and reviewable by a human. 

And then what's funny is I used to still treat code review at the same rigger I used to myself. And then what I would find is that Bugbot and similar products would actually find things that I missed. And then it just kept happening. And I spent a lot of time code reviewing it. Some of the things it finds, it's like, "Oh, there's this race condition. And I looked at how it figured out what the race condition was. And I was just like, "I give up." I'm like, "There's no world I would have been able to put all those things together to catch this." Which, by the way, would never actually be triggered, but it's there. There is this like crazy - 

[0:46:05] KB: It is real. Yeah. No, I agree. Bugbot in particular is shockingly good. I don't use the Cursor IDE anymore. I would not survive without Bugbot. It's phenomenal. But, yeah. I think that is really interesting. And I think one of the things that is really important there that you're calling out is the fact that these things are doing the code has not lowered the requirement on engineering. If anything, it's increased it. And you're doing these very synchronous discussions and debates. And you slowdown in some areas in order to be able to speed up and just generate code quickly. 

[0:46:36] SK: Yeah. The trick is how do you do that? And they'll end up in a waterfall-like system, which some people push back on. And the idea is that we come back and edit the specs. We don't always get them right. And sometimes going through it, actually you can get better, you can iterate on them. A lot of times you can't even - the size of the work done is not one thing. It would take a few things. We kind of break them up into almost sub-specs. And there's a whole process there. 

But the cool thing for us at Redis is, since we've gotten so good at this, and we've embraced it fully, one thing that we do and our leadership does, which is really amazing here, is that the default answer is yes of AI. In the sense of we try to enable everyone. I talked to a company where it's like, "Yeah, we're just getting access. Some people have select access to like this." And I'm like, "You don't even realize how behind you are if you're not letting your engineers use this stuff." 

I understand why people might feel that, "Oh, we need to control this more." But it's night and day. So, it lets us build amazing products really quickly. And because we've learned how to do it early on and we haven't let the quality drop. That's the other thing is there's a difference between agentic-powered engineering and vibe coding. And nothing wrong with vibe coding. I mean, a lot of my demos are vibe coded. But they're demos. I don't vibe code database code. 

[0:47:51] KB: Yeah. There's pace layers, right? There's like, "Okay, this thing can be vibed. And if you don't like it or it breaks, throw it away, do it again. This thing, you got to keep it working right." 

[0:48:00] SK: Yeah. 

[0:48:01] KB: Awesome. I think that's super helpful. We've covered a lot. We're getting close to the end of our time here. Is there anything we haven't talked about that you think we should discuss before we wrap? 

[0:48:11] SK: I think the big thing and the biggest takeaway is in this next phase, which nowadays that could be like another year or two. But the big change is going to be the switch of systems from linear RAG-style to context-based. So, everyone's going to be building these context engines. And I think that that's just true. I mean, we at Redis are obviously enabling it and building a ton of awesome products to enable it. But I think that that is the biggest shift is that attention is all that matters. Paper is what kind of got us here. And I think if you're building real products, you just assume that someone else is dealing with attention. And for you, context is all that matters. 

[0:48:48] KB: Context is all that matters. Love it.

[END]