EPISODE 1875 [INTRODUCTION] [0:00:00] ANNOUNCER: The rise of language model coding assistance has led to the creation of the Vibe coding paradigm. In this mode of software development, AI agents take a plain language prompt and generate entire applications, which dramatically lowers the barriers to entry and democratizes access to software creation. However, many enterprise environments have large legacy code bases, and these sprawling systems are complex, interdependent, and far less amenable to the greenfield style of Vibe coding. Working effectively within them requires deep context awareness, something language models commonly struggle to maintain. Augment Code is an AI coding assistant that focuses on contextual understanding of large code bases in enterprise settings. It emphasizes tooling to manage large development surface areas, while automating PRs and code review. Guy Gur-Ari is a Co-Founder at Augment. He has a PhD in physics and was previously a research scientist at Google, where he worked on AI reasoning and math and science. Guy joins the podcast with Kevin Ball to talk about Augment Code, its focused on full context for large enterprise code bases, code review as the new bottleneck in AI-driven development, and much more. Kevin Ball, or KBall, is the Vice President of Engineering at Mento, and an independent coach for engineers and engineering leaders. He co-founded and served as CTO for two companies, founded the San Diego JavaScript Meetup, and organizes the AI in Action discussion group through latent space. Check out the show notes to follow KBall on Twitter or LinkedIn, or visit his website, kball.llc. [INTERVIEW] [0:01:53] KB: Guy, welcome to the show. [0:01:55] GG: Thanks for having me. [0:01:56] KB: Yeah, I'm excited to get to talk with you. Let's maybe start with just a little bit of background about you and about how you came to start Augment. [0:02:04] GG: Sure. I was a researcher at Google working on AI, trying to open the black box of language models and other AI models and trying to understand what makes them tick, trying to especially improve their reasoning capabilities. I was very interested in getting them to solve math and science questions. Then, as I was watching these models get better and better, I felt that we were crossing a threshold of language models actually becoming useful. This was before ChatGPT came out. I think once ChatGPT came out, it was clear to everyone that these things are actually really useful. To me, the other big reasoning task out there besides math and science is code. I and my co-founder, Igor, we felt that there was a big opportunity here to take these models and productionize them and build AI coding assistants that would help enterprises working on large code bases get the most out of AI models. That's why we started Augment. [0:03:04] KB: Nice. Well, a lot of different things to dig in there. I actually want to start with the reasoning around math, because one of the things that math is very interesting for is you've got a way to formally validate on the other side, right? You've got a formal checker. I'm curious, does that apply as you move into code? Or, how does that reasoning approach need to differ as you move into something a little fuzzier? [0:03:26] GG: Yeah, so it's interesting. In math, formal verification still is a big area of research, right? That's the promise is that you can formalize theorems and then automatically verify the output from language models. With math, it's actually a bit complicated, because math wasn't constructed to be formally verified. It was constructed by people who were doing math informally. Then as time went on, tried to put more and more rigor into it. Now, we're at a point where they're like, what I would call corners of math that can be formally verified. But most of math has not been formalized to the extent that you can put it through a theorem. That vision of, let's just verify all the math outputs, it still remains a distant vision. For code, it's different. Code was, in some sense, built from the ground up to be executed. It's not formally verified, but you can execute it. You can see what happens when you run it, because that's the whole point. I think that's why we're seeing AI for code really take off right now is because that's the first domain where we can really close the loop between the model writing code and then being able to execute code and getting the feedback from that and iterating until it gets the code to work. I think with code, this is why we're realizing this vision of really grounding the model's answers in reality now, and this is why we're seeing agents take off, and so on and so forth. It's a very exciting time to be working on AI for code. [0:05:00] KB: Yeah. That ability to close the loop is one of the things that I've seen, as people have started figuring out AI coding techniques, that seems to be key. How much can you automatically validate, whether it's by type checking, or tests, or what have you, how much can you dynamically validate by giving the agent access to tools to test it? How much of that are you tackling within Augment itself, versus leaving two developers to do? [0:05:27] GG: We try to do as much of that at Augment as we can. We try to provide the agent with all the feedback that's available to us to make it do as good a job as it can. Everything you mentioned, type checking, linter errors, then tests, these are all things that we try to either automatically provide the agent with, or nudge it in the right direction so it can go do things itself. For example, we don't want to run tests automatically. That's probably a bit of a heavy weight thing for us to do, but we do try to nudge the agent in that direction, because we know how useful it is when the agent actually runs tests. Yeah, it's a combination of all these things. I think we're really just scratching the surface. There are things like production logs, metrics, traces, which are I expect to also be very useful signals for the agent. These will take more work to get to. I think at Augment, we're all about the context, right? The context that we provide the agent is the most important thing that determines how good a job it's going to do. It starts with the things we just talked about. We try to surface context from the code base itself, no matter how large it is to help it get the lay of the land and what code it can use to do its job and what patterns it should follow. Then, I expect that as time goes on, we will accumulate more and more context sources that we can give the agent, including these things that keep it grounded in terms of code execution. [0:06:59] KB: Yeah. I think that's one of the things that seems to be key to making these things work is like, how are you managing that context? Without asking you to disclose any secrets to what makes Augment tick, I'm curious, how do you think about those different pieces, like implicit context where you just surface it, versus giving the user tools for explicit context, discovery of what's the right stuff? Are there intermediate documents? How do you think about that context management problem? [0:07:26] GG: Right. One principle we have is that we try to keep the agent as autonomous as possible. Models have gotten very good at being able to figure out which tools to call when. We try not to put things automatically in the context window, unless the agent asks for them. One reason for that is that on the one hand, it's not necessary. On the other hand, developers use our agent to do such a large variety of things that we don't want to guess at what the best context is. We will not put things automatically in the context window from the code base, for example, unless we're really, really sure that this is what the agent wants. That's part of it. On the other hand, the agent doesn't always know what it doesn't know, right? Maybe there's a source of context that it's not even aware of. Linter errors is an example. Those are things we often can get our hands on automatically. But the agent doesn't know, maybe when to go and call a tool to give it linter errors. When we're pretty sure that there is a source of context that's important for the agent, then we will figure out how to slip it in automatically. I think there's a bit of a balance between those things. There's another principle that we follow which we're calling infinite context. That's where we try to make sure that the user never has to think about the size of the context window of their model. It shouldn't matter how large your code base is. We should just be able to surface the parts of it that are important for the agent. That's where our context engine comes in. You should also not care about how big the context window is in terms of how long your agent session is. We want the user to be able to go on with their agent session, basically, forever. We will do the context management behind the scenes to make sure you never run out of tokens. That's another part of the experience that's very important to us. [0:09:28] KB: Yeah, totally. I think that situation of context rot, where you have this longer and longer thread and suddenly, your agent is not doing what you ask. It's maybe forgetting some of the things from earlier, or it's when I've seen a lot as it'll start and it'll be like, "Do you want me to do more?" Instead of just actually proactively doing more, all seem to be signed. Yeah, if you can make that go away, that will feel magical. [0:09:51] GG: Exactly. Some of it, we're able to make go away. Of course, those tokens need to live somewhere, right? You need to either retrieve from them, or summarize them, or do something else. Those are the tricks that go into how to make it indeed feel magical. This problem where it stops paying attention to things that were previously in the context, that one, I see it a lot. We still have not been able to solve that, but we have some ideas on how to improve on that, because yes, the longer the context gets, the harder time these models have figuring out what to attend to. That's definitely something we're seeing. [0:10:26] KB: Well, and this gets into a topic area that I'd love to get your sense on, which is effectively using the tools. Because, I mean, I agree, we're at a place where LLM-based coding tools are completely transforming the way that we write code. But they're not a drop-in replacement for the tools that we used to have. I think some of the widely varying results people talk about, Twitter will have somebody ranting about how these things are worthless, and somebody else talking about, "How can you do that? I've 100X my productivity," has a lot to do with how we approach them. How do you think about using AI coding tools in the day-to-day software development process? [0:11:04] GG: Yeah. To me, it starts from a pretty simple place, which is prompting. What I find is that users who don't get a lot of value out of these tools, when we really look into it, it often comes down to prompting. Prompting is about, I like to say that we put this prompt box in front of users as if it's this magical thing. The hype is so prevalent right now that people maybe feel like, "Well, it doesn't matter what I put in there. This thing is so intelligent that it should figure out exactly what I need." If it starts going off the rails, then it's just useless. Where their reality is, even in the prompt box, context really matters. It's funny, but you need to put yourself in the model's shoes and try to understand what context does the model have. Let's say, the model with Augment, it knows all about my codebase, but it doesn't know what I'm trying to do right now. It doesn't know my intent. The more I can tell the model, or the agent about my intent, and the more I can tell it about how I want it to accomplish the task, the better result I'm going to get. One thing we added to help with this is a little feature, it's called the prompt enhancer. It's a little sparkles button in the prompt box. You can write your half-baked prompt, because many people are also, me included, are just lazy to write very long prompts. Just put your short prompt in there, and then click on the prompt enhancer, and it's going to turn your prompt into a mini spec with all the details and things that are needed to get the job done. Then you can look at that spec and see what details it got right, and what details it got wrong, and fix the ones that got wrong, and that all helps keep it on track. I'd say, it really starts with prompting. That's the most basic thing, and what context you put in the prompt really matters for the results you're going to get. [0:12:55] KB: Yeah. I love that feature, because my flow right now, any time I'm doing anything moderately complex with these tools, is I will say, this is what I'm trying to do, write me a spec. Then we do exactly that iteration process. It is a process of taking that implicit that's inside of my head and helping me turn it into something the model can act on. This raises this question of like, what are they capable of? What is the current state of agentic coding? How complex of a feature can I click my sparkles button, have a little back and forth with, and actually end up with something that's implementable? [0:13:30] GG: Yeah. I would separate it to, what can you do with some back and forth, versus what can you do in one shot? Because one thing we're seeing is that these agents are now good enough that you can do a lot one shot as well. Not as complex a feature, but there's a bunch of stuff you can just automate away. I think in terms of feature complexity, so let's say, my PRs typically get to hundreds of lines, or if I stretch it, low thousands of lines. I write all of them with our agent. The fact that it understands the code base means I don't need to do that much handholding. I can focus on what should the design be? What should the architecture look like? If it got it wrong, steer it in the right direction, or do it initially through a spec, like you mentioned, which is also something I do more and more these days. I'd say, I usually work on PRs with the agent. It's been a while since I encountered a PR that I can't just write with the agent with enough steering. I think doing things completely automatically in one shot, here we're talking about simpler tasks. Maybe there's a class of tickets. This is something I've seen one of our customers do. They use our CLI tool that we've recently launched to automate ticket to PR journeys with a prompt that they've developed over time and fine tune, so that there's a whole class of tickets they can just turn into PRs. There are other things that we use it for, like code review, automating code review and putting comments on PRs. These are again, things that don't require a lot of steering initially. Simpler tasks you can do in one shot, but complex tasks to me, it's up to roughly a PR level. [0:15:08] KB: Yeah. No, that makes sense. One of the things you mentioned there is code review. I'd actually love to get a sense of what are you doing at Augment for code review? That's a big pain point I've seen, because as these tools make it easier to write more and more code more and more quickly, I at least have ended up reviewing - I probably reviewed 100,000 lines of code in the last three weeks. That is painful. How do you keep up with that? How do with these tools help you navigate that? [0:15:37] GG: Well, that's a lot of code. 100,000 lines is a lot of code for a few weeks. [0:15:42] KB: I mean, that's what these tools enable if you let them just rock and roll, right? If you're doing some Greenfields work, they will generate a lot of code. Maybe I'm exaggerating slightly, but certainly 50,000. [0:15:54] GG: Yeah. No, that makes total sense. That's what we're seeing as well is that as agents start writing 80%, 90% or more of your code, they write it so quickly that code review becomes the bottleneck. We're seeing that internally. We're seeing that with customers. I don't have anything to announce yet, but we are definitely trying to figure out how can we give users the best code review experience. There are a lot of interesting questions around that. What I can say is GitHub, which is where most folks do code review was not designed with agents in mind, right? [0:16:30] KB: DUI just starts to choke when you get these big. Yeah. [0:16:34] GG: Exactly. Exactly. Also, with agents, you can do more. Agents can go and actually change the code, right? It doesn't have to be just the back and forth with the users. We're trying to figure out what code review experience we want to provide our users. With CLI, we're shipping it with a GitHub action that you can use to do code review, if you like to install that. Then the bot will comment on your PR and that's something we're using internally. Maybe separating it to short-term and long-term. Short term, I think there's a lot we can do in automating the first pass at a PR trying to find all the low hanging fruit there in terms of maybe they're bugs, maybe they're glaring inconsistencies, all kinds of things like that. We're also using our agent to write PR descriptions, which is something I found very delightful, because it doesn't fit, especially with all the context awareness that it has, it does a very good job of describing what the PR is even about to help the human reviewer. These are all short-term things, and I think we'll have something to announce around code review fairly soon. I think longer term, it's very interesting to think about what should the code review experience really be like? If you think about it, if there's one agent writing the code, and then we're going to say, okay, there's another agent reviewing the code. Do we really need these to be two separate agents? Or, can we maybe take a step forward and merge these things into one? Or maybe not. Maybe it does make sense for them to be adversarial in some sense. Then maybe that's hard to achieve in one agent. I think, these are the next questions that will be interesting to figure out for code review is in the age of agents, does it even make sense to have that separation between writing code and reviewing code? Or should it really be all part of one thing? I'm not sure what the answer is, but those are the kinds of questions that we're trying to ask. [0:18:30] KB: Yeah. No, it's a super interesting one and one I've been thinking about, too. Because if you think about code reviews, they serve almost three purposes. There's the easy one, checking for bugs. Is this broken? Is this not broken? As you highlight, we can already do that pretty well with agents. Then there's architecturally, is this evolving our code base in the right direction? Is this fitting in with our architecture? Is this taking advantage? I don't actually know how well agents do on that. As you highlight, they often don't know what they don't know. Maybe it doesn't know that there's a pattern over in some other part of the code base that it should be using. Then, there's just even keeping track of what's going on. How do I know what's in my code base? Does it even matter anymore? [0:19:11] GG: Yeah, exactly. I think, yeah, going back to the code base and maintaining it proactively is something, again, it'll be really interesting to see how that pans out. I wonder how developers will react to that, because developers typically have their tasks that they want to focus on. If you go and tell them proactively, I know at Google, I used to get these pull requests, or change requests that were like, I think the tool is called Rosie, or something like that. "I ran and I improved your code. Why don't you review the stuff I did for you?" Like an early version of agents going through your code and trying to do stuff. I never liked that, because it always felt like a distraction from the task that I'm supposed to do, right? These are the kinds of things that will start showing up as we try to automate more and more with agents. I do think the point about architecture, I suspect that's going to be a hard rock for us to figure out, because that's a very common failure mode from what I've seen is agents will get to correct code, but code that has pretty bad design, or pretty bad architecture, and if you don't steer it in the right direction, you're going to eventually get yourself into trouble. You're going to get to an unmaintainable state of the code, basically. It's still unclear to me if other agents can go and find those problems, or if the level of intelligence of the models is just not good enough yet to do proper design review, or proper architecture review. It's also fascinating, because these things are moving so fast. Maybe the models today cannot do it, but maybe three months from now, they will be able to do it. It's a bit hard to predict. I do think this question about architecture and design is key. I think the best we can do right now is probably, help humans understand the decisions that were made so that they can make a judgment call and decide if something was done properly or not. That can definitely be part of an agentic code review, just to explain how the code fits into the grand scheme of things in your code base. [0:21:06] KB: Yeah, that makes sense. Well, and I think a lot of this gets to this question of the gap between vibe coding, which is all the rage, right? Let's vibe code this. Let's vibe that, which I mean, got to admit, it's fun. You have a script, or a Greenfields project. You don't care about how to maintain it. Yeah, vibe it all out. Let the LLM make all the decisions. Professional software engineering, you're maintaining a code base over time, you're evolving these things. Yeah, some of these questions of maintainability, architecture become much more important. I'm curious, are you all looking at both of those use cases? How do they differ in terms of what you need to build into your tooling? [0:21:44] GG: We are targeting professional software developers and professional software teams. That's who we're building the product for. You can use it for vibe coding, if you want. It's totally fine. If you have a code base that's not trivially sized, you're going to get a lot of value out of it, because it will understand your code base better. Everything we're building is with an eye to word professional software developers. Those folks usually deal with large code bases. And so, we build the context understanding into every part of the product, including, for example, the prompt enhancer. The prompt enhancer gives you a mini spec that also takes your code base into account. It's not just based on your prompt. It's based on everything it's seen. That is also why we're trying to be very thoughtful about how we do code review, because we know that code review is such an important function in professional software teams. That's what we're mainly focused on. I would say, also, personally, yes, it's a lot of fun to do zero to one with vibe coding and not even look at the code, and it just see that it works, and so on. I've seen this repeatedly, once you get to around 10,000 to 20,000 lines of code with your zero to one project, if you haven't looked at anything the agent has done, you're in for a rude awakening, because something's probably messed up in there from what I've seen. It's really worth it to review the code as you go along with the agent, even if you're doing zero to one. Of course, if you're not doing zero to one, you have to, because you have an existing code base and you wanted to respect the patterns, and so on and so forth. [0:23:19] KB: Yeah. No, totally. I think one of the things that you all are solving for with the context is the fact that anytime you have a set of existing code, it may not match what's out in the world, especially legacy code bases often have their own special snowflake pieces. I think that's a place where many developers get hung up, because they say, "Hey, it doesn't understand my code. It's trying to write something that doesn't match at all." I think my intuition is if you're doing the context management, right, it will, but I'm curious, how do you navigate legacy code bases? Are there differences in how the product approaches those? [0:23:54] GG: Actually, no. We've come up with a solution that works for small code bases, large ones, new, old. It is very steerable. One thing that happens with legacy code bases is that sometimes there are parts of the code base that you wanted to use, maybe the new stuff. Maybe I'm working on a new piece of code and I wanted to follow the new way of doing things. Then sometimes I wanted to follow the old way of doing things and these can be different parts of the code base. Augment is incredibly steerable. Even though it has a broad view to the whole code base, you can just tell it. That's where the intent comes in, right? It can see the whole code base, but it doesn't always know what it is that you want. You can tell it in a few words, which parts to follow, which parts not to follow, without having to go and point it at specific files, or specific functions, or anything like that. You can keep it at a fairly high level. As long as there's enough guidance there for it to figure out what you're talking about, you're going to get good results out of it. Another thing I wanted to mention in the context of professional software developers is we try to meet developers where they are. That means that we don't actually have our own ID. We've never developed our own ID. We integrate into VS Code. We also integrate into JetBrains and Vim and now CLI, if you want to work in the terminal. That to us is all part of, we want to give the agents the best context. We want to let the developer do as little as possible and also not have to change their work environment. If you really like using JetBrains, as many of our developers do, you can just use Augment and JetBrains. You don't have to switch tools to do that. [0:25:35] KB: That is definitely a win. I'm a long time Vim person myself. I briefly had to navigate into VS Code, or VS Code fork when Cursor was the bleeding edge of everything in this domain. I'm very happy about the trench back towards, okay, CLI tools and integrations everywhere. Tapping into something that you mentioned earlier and digging into this, you mentioned something about how, oh, the models today can handle this, versus that. How do you think about model selection? This is something I've seen some tools just expose, here's your palette of models. You do your own thing. Others say, "No, we're going to pick for you." How do you think about what models to use? [0:26:16] GG: Yeah, it's constantly evolving. For the longest time, we did not show any model choice to users. We even went on record saying, we never will. The reason was that there was really one game in town and it was Claude that was Sonnet 3.5, and then through 3.7 and 4 was just by far the best coding model out there. Now things have changed. There are multiple models. I would say, using GPT-5, it's definitely right now in my experience on par with Claude. We're seeing good results. We're still trying to figure out the right reasoning level for GPT-5 and what gives you the best value. We're making improvements and how we integrated it into the product. GPT-5 is a very solid coding agent, I would say like a frontier coding agent. There are others that come out. There are a lot of interesting open-source models that come out. Rock code has a lot of people talking. I would say, some of those other models are in a different tier. I would call them, they're low-cost models. They're not as good as the frontier models, but they cost a fraction of the price of the frontier models. Almost as often happens in this space, almost overnight, everything changed. Going from one model that you can use to having, I don't know, four, maybe five models that you could actually conceivably use as agents, we call these all these other models viable agents. It's maybe not the best of the best, but you can still use them. Maybe you can even use them daily. We now have a model picker. We put GPT-5 in there, because we felt that for the frontier models, if the model is good enough, and we've seen that it has a different style, we do want to give users the choice to go between them. Maybe you like GPT-5's answer style better, right? So, you should be able to use that. [0:28:09] KB: They definitely have different styles. Claude will write buckets of code and GPT-5 will think for a few 20, 30 seconds and then make a two-line change. [0:28:18] GG: Exactly. I think, one thing to say is that there's no longer one right answer for everyone on what model to use. That's a primary reason for us to introduce a second model in a model picker. Then besides that, we are looking very closely at the other models, including the low-cost models and trying to understand how they fit into the product, because now that we have a model picker, it's easier to add them. We don't want it to be a slippery slope. We're not going to have 20 models in there. That seems overwhelming. I think, again, if you're a professional software developer, you probably don't have time to spend evaluating 20 different models. We want to still be very opinionated about it. We want to explain what the different models are good for and when to choose them. We're thinking to have a very short list of models and not every new model that comes out, it's not going to automatically make it in there. It does seem like a time where it makes sense to have more model choice. We'll see, maybe in the future, it will all get consolidated again into one big system, and we can remove the model picker. Right now, it's a moment in time where choice seems like the right thing within limits. Maybe three or four, rather than 20 different choices. [0:29:33] KB: I'm curious, how much individual harness code you write around those different models. To use a very slightly dated example, but one of the things that has been observed widely is, for example, the various Sonnet models react very well to XML formatted prompts of different sorts, if you're trying to do structured things. Whereas, if you're trying to do structured things over in GPT land, Markdown might be a better answer. Those are things that you can swap out at different levels. I'm curious, yeah, how do you think about not just model picking, but the harness around the models? [0:30:06] GG: Yeah. Every model we introduce, and every time we've also evaluated a model, or upgraded a model even within the same model series, it required its own prompt. I think that most of the prompt is shared. By prompt, there are two pieces to it. There's the system prompt, and then there are the tools and the schema. How they're defined, what the descriptions are. Those are the main things that go in the prompt. Different models require different prompts for sure. One difference between Sonnet and GPT-5 is that Sonnet is a more opinionated model than GPT-5. If you want it to do things that it doesn't naturally do, like for example, explore the codebase before it - because it's really eager to go and edit code. [0:30:53] KB: Sure is. [0:30:53] GG: Sure is. Yeah. [0:30:55] KB: Please, fix this for me. I've reformatted your entire codebase. [0:30:59] GG: Exactly. Exactly. It's now production ready, right? That's like, yes. And so, if you wanted to go and explore a bit and collect information before it starts working, which is very important for us, for our target audience with existing large code bases, you have to really push it to do that. GPT-5 is different. It's a lot more steerable. The way GPT-5 reacted to those instructions was to do an excessive amount of exploration. We had to dial that back. That's an example of just different, I think it just goes back to different ways these models were trained, and different things that were baked into the post-training, basically. We have to make those changes. I can point to another common failure mode is the ability of these models to edit files. File edits is such a key function. They just get it wrong so often, and it always requires tweaks to getting them to reliably edit files. The worst thing that happens with file editing is they will make a successful file edit, and they will drop a line, or add a line, and it just results in uncompilable code. The less bad thing is to have them make a tool call, but then not actually be able to edit the file. Have the tool call failed, which is a bit annoying, but then models typically recover from that and make a correct tool call. That's, editing files is another area where we typically have to do some amount of prompt tuning to get new models to work. Yeah, but I say, the main thing is just the overall style. If it at all can behave as an agent, then you want to tweak the system problem to get it to behave nicely. [0:32:38] KB: That brings to mind something else. Something I'd heard of one of the players in the space exploring, because of exactly that problem you highlight around editing models is actually building a custom diffing model, where they would take whatever the model generated and pass it through this customized model to generate a good diff, or things like that. To what extent are you all playing with your own custom models? Is that something that you're investing in? Where is there space where that's needed? [0:33:04] GG: Yeah. For us, we had a custom model for edits, but models have gotten good enough and we were able to - with enough prompting work, we were able to remove that need. And so, we don't have a custom model for that. The main place where we have custom models are for the code base understanding for the context engine, because that is where - Agents can navigate the code base with ls and grep and things like that. Ultimately, figuring out what part of the code base you need is a retrieval problem. Even with agents that have access to tools, it's still a retrieval problem. There is a basic problem in retrieval called the semantic app. That's where if you are trying to find things based on a given string and the actual string, and let's say, the code base is quite different, you're going to have a hard time grepping for it. Now, agents have made that a bit easier, because they can try out a bunch of different combinations, right? You sometimes see them hunting around for the right string to grep, for example, in a code base. Still, if I didn't give it quite the right string, or if I don't even know what the string is called, like, let's say, I'm looking for a function and I don't even remember what that function is called, then agents with just file system access are going to have a hard time finding the right information. That's where we come in and train our own models to be able to close the semantic app and surface the right information to the agents no matter what. That's one area where we're always investing and iterating with our own models. Then, another area is the non-agentic features, completions, and next edit. Those are all driven by models that we train ourselves and are, of course, still maintaining and iterating on. [0:35:00] KB: Yeah. That's super interesting. I think a nice thing about that is it gives you a potential mode, where a lot of companies in your space who are not doing that - I think there was a direct quote from somebody at one of these companies, says, there is no moat. I have no moat, whatsoever. That's definitely been there in terms of talking about it from a machine learning standpoint. I think there was a famous quote out of Google at some point leaking, like we'd have no moat. Everybody can train these models. Maybe to be just capital. How do you think about this space of coding agents? Are we going to end up in a place where somebody develops a big competitive advantage and they run away with it? Or there's always going to be five different flavors? [0:35:39] GG: We can talk about the model layer and the application layer separately, right? For the model layer, again, Claude was the clear leader for about a year. Now, others have meaningfully caught up, I would say, especially OpenAI with GPT-5. I'm excited to see how that evolves. It doesn't seem like anyone is - Because of the recent developments, it doesn't seem like anyone is running away with it. There's healthy competition, from my perspective. Then, I think on the application layer, I think that context understanding is a big part of the value that we can provide. I don't know if we have a moat there, but I think we are clearly differentiated in terms of the performance that our agent makes on large code bases. For us, we intend to keep pushing in that direction. I mean, code base is the most important source of context for the agent. But there are others. We recently released a new feature where the agent also has access to your code history. Those are access to commit messages. There's a lot more, of course, and professional software development, there are a lot of other interesting sources of context. I do think that as we're going from agents are being used for interactive software development, the agent's being used more and more for automation, that's where there is a lot of room for innovation and kinds of innovation that we haven't seen so far. Because we and others, we've all been so focused on working in the IDE and giving developers the best interactive experience that now, I expect there to be a new wave of product development in terms of, how do you take these agents and now start automating more and more of the software development lifecycle. Of course, it's not going to happen in one shot. We're not going to go in and just automate the whole thing. That will probably take years if we ever get to that. But there's a lot of low-hanging fruit, like I mentioned before; ticket to PR, looking at production logs, incident response, a lot of other things that you can do to make developers' lives better. That's where I expect the next interesting products to come out. While, of course, we're still iterating on improving the product for interactive software development. Because for the complex tasks, you're still going to want to be in the IDE, you're still going to want to supervise what the agent is doing closely. I don't think that's going away anytime soon. Just going back to your question, I think in the application layer, that's where I expect a lot of the competition to happen is to see which vendors can really go beyond the IDE and give you a cohesive software development experience, maybe a platform, not just for individual developers working on their feature development, or about fixes in the IDE. But really, how do you go and improve the way the whole software development lifecycle happens on your team? That's where I expect that competitively. [0:38:32] KB: I love that, because I feel like, yeah, to your point, all these tools are focused on the individual developer. But team dynamics are shifting as well. I mean, highlighted all the code I had to review, but that's just one example of the ways in which this is changing. You have a frontline seat, your customers, as they're adapting these tools and doing it, how they're having to change. What are you seeing in terms of the ways that teams are changing their functioning in this era? [0:38:57] GG: It's just starting. I think it's a bit early, because the model capabilities are fantastic. The products, I don't think the products are completely caught up to the model capabilities yet. Then, I think for many developers, most developers are still in the face of adopting these tools and trying to understand how do you get the most value out of the existing products. It's all evolving together. Models are getting better, products are getting better, and developers are learning how to adopt and get the most value out of the products. I think, from what I've seen, the most forward-looking teams are looking very closely at automation. From our perspective, they're using our CLI tool to automate things. I mentioned ticket to PR before. You can do things like, automatically scan your production logs for errors and then open tickets based on those, and correctly assign the tickets to individual developers. Definitely, people are looking at incident response. I think, code review we already mentioned. People are getting very creative with it. I think some folks are doing scanning for security vulnerabilities. These are all things that once you have a tool like CLI, which means you take your agent, it's the full feature agent with full code-based understanding and so on, but it's in a CLI form factor, you can easily put it into your GitHub Actions, or your CI/CD platform, and you can start automating everything, because you have this unit of intelligence that has escaped the IDE, and you can now put it anywhere you want and get intelligence in there, with a full understanding of your code base. It's all very early, but that's where I'm seeing things, or developers who are more forward-thinking, that's how they're pushing the envelope is they're using these tools to automate more and more things in their team. That's the main thing I'm seeing right now. [0:41:00] KB: If you were to track that forward, what do you think being a software developer in say, 2030, or 2027 even, what does that look like as we map this forward? [0:41:13] GG: It really depends on how fast and how models are going to get better, right? Right now, models, as we talked about, they're very good at writing code and writing correct code, but they're not good at looking at the big picture and making the correct architecture decisions. If we extrapolate from that and assume that that remains a limitation of models, but they will get more intelligent, they will get faster, they will get cheaper, right? Then I think it looks like developers become tech leads. They manage probably fleets of agents, and then the challenge for developers is going to be how much context can you fit in your head in terms of what all the agents are doing, right? All the context switching that you have to do when you look at what different agents are doing, which is already true today. If you're pushing agents to the limit today, you're probably running a few of them in parallel, doing a few different things. That already is like, even if you have two or three of them. [0:42:14] KB: My brain taps out at two. Sometimes I can manage three, but really, two independent threads is about all I can contain. [0:42:21] GG: Exactly. I do expect things to improve, because one thing that pretty clearly will happen is that I expect we will go from a world where you have one agent with one agent loop doing something, to having multi-agent systems, where you have probably a top-level agent orchestrating sub-agents. That seems to be where things are headed. We will be able to tackle, I expect, more complex tasks with less human involvement, but still, you need a human to supervise things at a high level. Again, supervise the design, the architecture, make sure we're going in the right direction. That feels like a tech lead role. One potential way this can go is developers will become tech leads. Then you know what skills do you need in that role? Well, what's the most important? The most important skills for a tech lead are, you need to understand the technology really deeply. Again, that's very important, so that you can supervise decisions and steer things in the right direction. The other very important skill for a tech lead is communication. You have to know how to communicate well. Well, that's just another way to say, you need to prompt the agents well, right? When we say like, good prompts, that just boils down to communication. That's one way this could go. If, on the other hand, model capabilities will get so good that you can rely on them more for the technical decisions, then I expect to see the role evolve to be more probably product focused. I mean, I expect that to happen anyway, but it could be that the product decisions become actually the main decisions that you make. Then, if you have deep technical skills, I'm pretty confident you will have an edge, because I don't think agents will get good within two years. So good that you really just don't even need to worry about the technical parts. Probably, you'll be spending more of your time as my guess, thinking about the product, thinking about users and making sure that the direction and those decisions are correct. In a two-year time frame, I think it depends on how quickly models get, especially at improving what are their current limitations, which is really, the deep, technical understanding. [0:44:32] KB: If I'm hearing you correctly, what the job starts looking at more and more, and this is already happening, but more and more, it's about the decision making. It's not the hands on the keyboard typing out code anymore, which it hasn't been since, probably Sonnet 3.5. It's about making decisions. If that is the future, what needs to happen in our coding tools, what does Augment need to do, and all of these other folks doing to support that decision-making process? [0:45:02] GG: Yeah. One thing that we think is important that we already talked about before is going from, as we're talking about tackling harder and harder tasks, more complex tasks, it's not an individual developer story anymore. It's really more of a team story and a team effort. We have to get a lot better at supporting whole teams, rather than just individual developers. Another thing we have to do is, in order to get there, we are going to have to automate a lot. we already talked about how code review becomes the bottleneck, right? There are going to be other bottlenecks that come up, like looking at what's happening in production. That's already an extremely painful thing in many companies is understanding what's going on and handling outages and breakages, and so on. For us, I think it means both developing features for the most common tasks to help developers become more productive by taking away the toil of those repetitive tasks, like code review, I would say. But then, also giving developers - developers are extremely creative and they're very used to automating things and building their own tools. From our perspective, we also want to give them the right building blocks so that they can go and automate tasks within their team, without us having to build every single feature for them. We constantly think about the balance of like, which features should we build versus which general tools should we provide, like CLI, remote agents, to let developers build their own automations. All of that has to happen for developers to really become tech leads, right? not have to worry about the day-to-day stuff. I mean, it's not just in the IDE, it's also all this other stuff you have to deal with. We didn't mention deployments. I mean, there's just a lot of stuff you have to do. Maybe at a high level, there's so much tooling that needs to be built to get to the point where developers can really just focus on the decisions. I think that's the large hurdle in front of us is building all of that tooling for developers and then making it as easy for them as possible to get to that point where they make these decisions. [0:47:12] KB: Yeah. No, that makes sense. Well, and one of the things you said there led me down a thread of thought. I know you invest a tremendous amount in context management. One of the things that I've definitely seen playing with all of the tools in this space is like, I still know more about my code base than the tools know, no matter how good they are. How are you thinking about exposing to developers the ability to build their own tools that plug into your context system, right? And say, "Oh, for my code base, I know, you should look at these docs, or you should search that, or wat have you?" [0:47:39] GG: Is the question, how do you take what's the developer has a lot more context than the agent? How can the developer effectively steer the agent or steering is one step, but you were talking about empowering developers to build tools. Can I build tools that plug into Augment? [0:47:54] GG: Oh, I see. there are different levels to this. The simplest level is all of our agents support MCP. If you want to plug into Augment, you want to give it additional context, either right, or connect to your MCP, I think the flip side of that is one thing we often get asked is people tell us, Augment has the best context understanding. But I have a more sophisticated, bigger system. How can I take Augment and the context understanding there and plug it into my existing system? Our answer currently to that, is we don't give you access to the context engine directly, but we give you access to the agent as a CLI program. that agent has the full code base understanding through the context engine. My recommendation is if you want to use Augment and especially Augment's context understanding in the bigger setting, just use the CLI agent. Just use it. You can ask it questions about your code base, right? It's not just for writing code. [0:48:54] KB: Treat it as a building block, essentially. [0:48:56] GG: Yeah. Exactly. For us, the CLI is a building block. You can use it for interactive development. You can put it in your GitHub actions, but you can also use it just exactly as a building block inside your bigger system. Maybe you have a bigger multi-agent system already that does stuff, and you just need to put the context understanding in there. Just use our CLI, and just use it to answer questions about the code base, or explore it, or come up with design docs, or specs, or whatever it is. It works really well for that. [0:49:22] KB: I love that. You've got then, essentially, a built-in ability to do a multi-agent system. You spawn Augment and say, "Hey, here's your CLI tool for Augment. Go and do a thing." [0:49:32] GG: Exactly. It's Unix, right? You just spawn another process. It just so happens that this process is this highly intelligent thing that knows about your code base, but it's still just a process. You can launch multiple of them. You can do whatever you want. [0:49:44] KB: That's really cool. I think, to your point, the more we can build these Unix building blocks and start layering and start doing that, we're going to see compounding effects here. We're getting close to the end of our time here. I want to check in, is there anything we haven't talked about today that you want to make sure we do talk about before we wrap? [0:50:03] GG: I think I would just reiterate that if you're working on a large code base, you've probably tried some of the other tools. If you haven't tried Augment before, I encourage you to go to augmentcode.com, download it. It doesn't matter if you're VS Code, or JetBrains, or you prefer to work in the terminal. Any form factor will give you the full code base understanding. I think you will see within, I don't know, probably 30 minutes of using it, just exploring your code base, trying to write code. I think you'll see the difference with our context engine. That's what we keep hearing from users. Try it out. If you try it out and have any feedback, we're always happy to get feedback, positive, negative, all of it. We're always striving to improve our product. Yeah, that's my recommendation. It really is, to me, it's like, I would say, a life-changing experience to be able to go into a code base and just navigate it. It's a freeing experience to just navigate it with pretty high-level instructions. I think the other thing I would mention, we actually didn't talk about that, but there's the use case where developers already intimately know their code base. I think in your case, Kevin, you know the code base really well and you think they'll make you more productive. There's another use case we see a lot in software teams, which is maybe I'm a new member to the team, or maybe it's a really large code base and there's a part of the code base I haven't gone into. That's where you see Augment really shine, because you can see how it does things that I can see how it does things that would have taken me days, probably, to figure out just in minutes, because like, "Oh, I didn't know the code is even structured this way. I didn't know how it works on, and so on and so forth." I can just get to working code really, really quickly. Looking at unfamiliar code is another very important use case for us and a very important use case for the context engine. [END]