EPISODE 1901
 
[INTRODUCTION]
 
[0:00:01] ANNOUNCER: AI-assisted coding tools have made it easier than ever to spin up prototypes, but turning those prototypes into reliable, production-grade systems remains a major challenge. Large language models are non-deterministic, prone to drift, and often lose track of intent over long development sessions. Kiro is an AI-powered IDE that's built around a spec-driven development workflow. It's focused on helping developers capture intent upfront, translate it into concrete requirements and designs, and systematically validate implementations through tasks, testing, and guardrails. It aims to preserve the creativity of AI-assisted development, while producing software that is ready for real-world use.
 
David Yanacek is a Senior Principal Engineer and a lead advisor on the agentic AI team at AWS. Today, his work focuses on Kiro, Frontier Agents, Amazon Bedrock Agent Core, and AWS's Operational Agents. He joins the show with Kevin Ball to discuss the design of Kiro, how spec-driven development changes the way teams work with AI coding agents, and what the next generation of agentic software development might look like.
 
Kevin Ball, or KBall, is the Vice President of engineering at Mento and an independent coach for engineers and engineering leaders. He co-founded and served as CTO for two companies, founded the San Diego JavaScript Meetup, and organizes the AI in Action Discussion Group through latent space. Check out the show notes to follow KBall on Twitter or LinkedIn, or visit his website, kball.llc.
 
[INTERVIEW]
 
[0:01:48] KB: David, welcome to the show.
 
[0:01:50] DY: Oh, thanks. Great to be here. Very excited to chat today.
 
[0:01:53] KB: Yeah. Let's start out with a little bit about you. Can you give me the quick rundown of who you are and how you got to where you are today working on Kiro?
[0:02:02] DY: Sure. I'm a senior principal engineer who has spent my coming up on 20-year career exclusively at Amazon with a singular purpose in mind, and that is to make developers' lives easier. I've been just focused on that. It all comes from, I guess, how we build and operate software here. We do DevOps. To us, DevOps means this, of course, takes different meanings depending on where you are and how you use the term. It's how language evolves, of course. To us, that means developers do the ops. There is no DevOps separate thing. It's just a state of how to do dev more than it is to be a separate thing.
 
Anyway, so because of that, that obviously puts a lot of work, responsibility on the shoulders of me, the developer. I've been moving from team to team over the years at Amazon, mostly AWS, trying to build the next thing that's going to help life as a developer be easier. That means, I found it tedious on the first team I was on to operate databases, especially when they scale and when they need to be highly available. On one hand, I didn't like doing database ops, because it distracts from the thing that I'm actually trying to do. On the other hand, I loved it.
 
When I heard that we're going to make a highly scalable, highly available database called DynamoDB, I signed up and I was like, okay, I'll help join that and build that. The thing that attracted me to that was that it's going to be large scale and just, that I'll never have to do database operations again, because it's just managed and everything. That and rinse and repeat with that pattern. I worked on Lambda, API gateway, serverless stuff. Operations is a big thing. I also worked on CloudWatch, which is the - I was on CloudWatch, the observability tool that we use broadly in what a lot of people do. Anyway, that's been my whole thing. Most recently, it's been with the advent of LLMs that has opened a whole new way of making developers' lives easier, and that's what brings us to Kiro.
 
[0:03:57] KB: Yeah. I have to say, I still remember manually sharding databases and having to bring up systems. I am very grateful for all of the work in where we are today.
 
[0:04:06] DY: Oh, yeah, yeah. All of the things is conceptual easy, but then tedious. That's 1.00 in the morning.
 
[0:04:14] KB: Which is somehow always when things go down, or when you're having to run those migrations, because yeah, no, it's such a pain, and less of a pain now. Let's talk about Kiro then, because we've been doing a whole sequence. I've been really diving into this for probably the last year and a half of how do we use these tools to code? They're incredible tools. They are non-deterministic. They have all these challenges. They're also wrought with a lot of learning that we have to do. What is Kiro? What is the take and overview of how it works?
 
[0:04:42] DY: Sure. Kiro is an AI development environment that helps developers go from prototype to production grade code, using a technique, approach we call spec-driven development. Spec-driven development, it keeps all of the agility and fun and iteration that you get with vibe coding, but adds just enough structure to make it so that you can produce the actual result that you want with where it runs as autonomously as possible. It's something where it keeps the IDE focused with an end goal in mind, and keeps focused on tasks that are in service of that goal, rather than having it one off.
 
The spec-driven development that's baked into Kiro is what makes it so people can produce production quality code that's thoroughly tested with accurate tests, and produce the right thing that they need for production, without the frustrations that you can get of like a wandering vibe coding experience.
 
[0:05:41] KB: I love this. I mean, I've been long and advocate of things like, document-driven development and this sort of thing. Spec can have a lot of different meanings with different levels of formality to it and different levels of restriction. How do you, within the context of Kiro, define what a spec consists of?
 
[0:05:58] DY: I'm hearing you say, maybe also, well, specs can be, they can be overly formal if one have done documentation driven. They can be potentially a little bit in the way. But with Kiro, I find it actually doesn't. It actually speeds me up and helps me. I found I just like it and it adapts to whatever your way of thinking and coding is. A spec to Kiro is just, it's three parts, but you just start with pretty much the same prompt as you would have otherwise. You say, "Hey, I want to build this thing in here and let me describe this thing for you, and let's build it." From then, it expands on that. Let's say, okay, well, you said you want to make a traffic light control system. I think that means that you're going to want it to keep track of when cars enter and exit the intersection. It expands on your prompt and produces more detailed requirements.
 
Then you can you read this doc, it's a markdown file and you didn't say, yeah, these requirements that's in this ears format, which just has some more now and shall kind of wording, but it's just very clear then. It's easy to read and skim to say, okay, this is what I want, or not. Then you can chat and say, "Okay. No. Actually, I don't want it to do that thing. I want it to do this other thing." You just chat with it. You can either modify the doc, or you can just go back and forth, just like you would any kind of chat-based LLM interaction. You chat with it about the requirements to agree on what we're going to build.
 
Whether it's something new, or a feature, like a spec, can be a whole new project, it can be a feature, it can be an upgrade. Also, hey, let's go upgrade this code to the latest Node.js version, because that's some maintenance tasks that I need to do. It'll come up with requirements and that includes acceptance criteria to say, well, okay, I'm going to make sure that in the end that these are the properties that the system needs to have, that's all part of the requirements. Requirements doc from there, once you say, "Yup, this is what I want," it produces a design, which is another markdown file that includes all of the technology framework choices that it's going to have all the non - give some more of non-functional requirements, if it wasn't in the requirements doc, class diagrams, architecture diagrams, the little snippets of code to say, here's how I plan on doing this or that.
 
Then I can skim that and see if I agree with that approach. It's nice that I get to see that. Those code snippets are nice, because I might otherwise be 20 minutes into this project and see, oh, I don't want you to go about it that way. I want to do it this other way. Then you have to throw all that work away. The design helps me make sure that it's going to take an approach that fits with my mental model for how this code base should be evolving, or start from. Then once I agree with that, it breaks the project into tasks. The task, it's actually a separate markdown file, a third markdown file that just says, okay, let's build the framework. Then now we'll do some infrastructure as code to set up some stuff. Let's set up a test environment. It just breaks down the implementation.
 
Maybe, let's create a database schema. Okay, now let's create it. It just keeps going. Then it even splits out optional tasks, which are nicer with tasks that you can follow up with later. Let's say, okay, then let's - it focuses on, let's get something tangible for you to see working, and then let's go do the thorough, all the tests, including a property-based testing. Essentially, a spec is just these three things that you were just chatting about and seeing what Kiro is going to be doing, plans on doing, and how it plans on going about it. I really just like doing that all upfront, because then I can just say, okay, go do the tasks.
 
[0:09:30] KB: Well, this is fascinating to me, because the process you've just described is one that is essentially what my team does, but we do it by hand, in the sense of like, we have a bunch of processes, it was like, okay, go and analyze what's going on, or let's write a spec together, let's iterate going through this. It sounds like Kiro bakes that into the flow as you go. Now, are those markdown documents, are they committed as a part of your code base? Where do those live? How are they managed over time?
 
[0:09:57] DY: Yeah. Ultimately, it's up to you, but the intent, I guess, and what I like to do and what I see everybody do is commit them to the code base. They're actually pretty flexible into how you use them in the future. One approach is that you use the specs to add a feature, start a project, and they're one shot, then you're done. Now, I should actually add that during this process, I actually go back and forth a lot. It's not just a waterfall. It's like, let's do the requirements and then the design, then the tasks. Sometimes I'll go back and forth. I'll say, when I see the design, I said, "Oh, this design I don't agree with, because it's actually missing a couple requirements." I forgot to mention that I actually want to use this framework, or something, or that I actually - yeah, well, I don't want users to be able to do that, or something. It's nice to be able to go back and forth, even once it starts implementing, I can go back in terms of the spec, or the tasks, or any part of the spec.
 
This is also true when you talk about where do I commit the spec. You can treat them as these one-off things where, okay, now that the implementation task is done, I'll put it there for posterity, so that later on, if the agent or I have questions on how we got here, I can see that, or ask questions about it, and the agent will read the specs that it exist for that project, consult them when it deems necessary, or when I ask it to. Some people like to keep a spec to be up to date with the project where they say, okay, this spec represents the overall architecture. Once I make a change, I want to go update some spec, but that's definitely a fine way to do it. Sometimes you might actually just say, hey, I'm going to have a design document also checked in that I can ask Kiro to say, update my overall design that's separate from a project specific spec. Once you're done, now update my authoritative design, once we're done and just keep it up to date with what we just implemented there. That's another nice way to tease it.
 
I tend to use specs as this start to finish. Then that's archive for posterity. Then I'll use a separate document to keep that overall idea of the current state of the architecture and the system and how it's built.
 
[0:12:07] KB: That makes sense. Let's now move into something you mentioned a little bit, which is this idea of, I think you said property-based tests, or things like that. I think at this point, pretty much everybody has experience using these tools. Depending on which ones you use, they are better or worse at actually sticking to the guidance, the spec that we've agreed on, things like this. How do you think about building in layers of validations, guardrails, and actually validating those requirements?
 
[0:12:36] DY: Well, I think it's really important. That's actually what the spec is super great at is that it captures the actual intent of what you're trying to get done, and whether or not it has done that yet. That's just the breakdown of tasks. Okay, haven't written these tests yet. Okay, we're not just one, to make sure it actually does the things that you wanted it to do. In terms of the quality of the tests, a nice thing about capturing the intent in the spec of the requirements and the design is that that gives a bunch of extra information that you wouldn't have otherwise gotten by just doing some prompt that you mentioned 20 minutes ago that's now floated off. To me, when I'm doing coding using an AI agent, that's where - whenever I type something into steering, or anywhere, that's the value that I'm adding.
 
I want to make sure that that's saved and consulted, and that's where it's really nice to have that all summarized into the spec where I can see that work that I've been doing with the agent. From that, because it describes the actual intent of what the system is supposed to do and how it's supposed to work, we can write more thorough tests, or Kiro can write more thorough tests than if we were just looking at some code and saying, "Okay, I need to write some unit tests now, or some integration tests."
 
Kiro uses something called property-based testing. It's not a Kiro specific thing. It's something that's been around in the industry for some time. It's something that we've realized that by having this spec, we could actually do thorough test generations using this technique. Property-based testing tests invariants. Then it writes tests to make sure that all those invariants are held during any kind of sequence of input. Rather than saying, writing a test for a specific scenario, this tries to generate many scenarios and make sure that those invariants hold true.
 
Let's go back to that traffic example. Let's say that you're building a traffic light system. A really important invariant is that at most, one direction has a green light at the time, right? It's obviously very important for a system to never have more than one green light. It's okay for it to have no green lights, but to have most one. If I think about how to test such a system, I want to make sure that every sequence of things where if somebody hits the walk button, or the emergency vehicle goes through, that is always - all these different things can happen. These are the inputs to the system can happen, different timings, power outages, power restores. I want to make sure that it always holds.
 
Property-based testing is where you describe a test in terms of these invariants, which can be generated directly from the spec. If the spec says that, hey, we want to make sure there's only one green light. Okay, so now we can generate property-based tests based on that. Property-based testing uses different frameworks that help drive these different inputs. You have the test that has the invariant. Okay, how do you drive a bunch of input at it? Well, that's where a input generator comes into play. We use one called hypothesis. It's a Python framework for property-based testing. That just generate from the spec that describes the types of input that you want to, what part of your system, it generates a bunch of permutations of that, and then feeds that into the test, or into all the other tests that might also want to have that input and test different to make sure that different invariants are upheld.
 
Then when a test fails, ultimately, then the agent can go and test its implementation against these, and then keep adjusting the implementation until those invariants hold during all the tests. One nice thing in property-based testing is they're very thorough with all the permutations of input. But that thoroughness can make it tricky to understand why a test failed, because it's like, okay, it's not just, oh, it's straightforward as a unit test, where you fed it this specific input, and then this assertion failed. That's very easy to understand. Okay, let's run it again with that input. With property-based testing, you have a whole series of inputs in a sequence. To replay all of those can be a little confusing to see which of those triggered it.
Property-based test frameworks use a technique called shrinking to take those different permutations of inputs. It finds the large sequence that caused the failure. Then it just tries to remove those states until it arrives at the most compact explanation for why the code isn't holding the invariants to be true.
 
Property-based testing, very powerful way of having thorough tests to make sure that the system is what we agreed upfront with the agent, of what the implementation should do, because it just tests so many boundary cases, rather than having to just fish for one at a time. Because I've seen agents just - well, for those listening, you're smiling here, because you know what I'm about to say. You'll see an agent kind of, I can't get the test to work, so I just comp it out, the body of it, so now it passes. That's great. Okay, let's move on. Then it forgets that it never did that. Having these thorough tests make it so that it keeps the agent honest, where it has to prove its correctness.
 
[0:17:32] KB: Yeah, there's multiple - agents tend to be, or LLMs, I guess, tend to be confirmation biasing and self-confirmation biasing. They'll try a thing and they'll get into this rut and by mapping the space out with the full property range, you can help them from getting in that rut. One of the things I loved about your description of the spec-based workflow is Kiro is taking you through this best practice, right? I've seen everyone's trying to figure out their set of things, and Kiros like, here's our approach. We're going to be opinionated. Let's go. Does it do the same thing there for spec, so it's like, okay, I've done my implementation. Now, it's time to do property-based testing. Here we go. Or are you prompting it to do that?
 
[0:18:07] DY: It actually thinks about it. It says, okay, here are the properties that I'm going to be verifying in the end. But you just decide, yeah. No, it actually has a step words. It's coming up with correctness properties. Whether or not I have it actually implement those tests right now is the optional part. Yeah, it's also reflecting on whether during that spec flow, it's saying, "Do I actually have enough clarity from you?" This is actually an interesting thing that Kiro is doing to decide how much more it needs from you. I mean, you can obviously weigh in at any time with it, but it checks, it reflects on are these requirements clear, or conflicting? That's an interesting behind the scenes that it's doing, but it's also doing that around what properties then should I test? What are the key tests to make sure that these requirements are upheld, so it's just doing that reflecting during that spec flow.
[0:18:59] KB: That's great. I bet that leads to substantially better specs, but that also means, hey, whatever environment I'm in, I just need to find a test framework that can validate these correctness criteria.
 
[0:19:11] DY: Right. It's pretty powerful.
 
[0:19:13] KB: Feeding into this concept, we've talked about what Kiro is doing in an opinionated way, but what mechanisms, hooks, skills, other form factors does Kiro offer for developers to customize it to their environments, to their particular preferences, to their team practices, etc.?
 
[0:19:32] DY: I think this is a really important thing to get Kiro to understand how you work and your team works and everything. There are essentially, three features in Kiro that help with this. First, I can write a steering file, or a series of steering files, where that just describes my development environment. Maybe I say, "Hey, I'm always using this. My team is always using this for continuous deployment. We use these frameworks." Just set that, the stuff to always be keeping in mind. 
 
These are potentially, you can have many steering files. They do take up your context window, so you want to keep them relatively small with pointers to where to go for more information about a particular thing. That's really useful, because by having multiple of them, you can have maybe a company-wide one, if you have many teams that try to do certain things the same way, but then you can have your own for your own team to do things the way that your team does it, that's maybe different than other teams. Then your own steering file that says, here are my own preferences. The steering files help it stay focused and be able to do things the way that you are used to having them, without having to repeat yourself all the time.
 
Then there are powers, which Kiro Powers is a feature we added, relatively recently, that just bundles up MCP servers with steering and with hooks, which I'll describe in a second. These powers are things that are loaded dynamically, depending on what you're doing. Supabase, for example, provided a Kiro Power on Supabase. If I say that I'm using Supabase, or my project clearly is, Kiro will load the Supabase power, Kiro Power. It suddenly shows up with these MCP servers and steering files and hooks that would help it when it comes to using that platform.
 
[0:21:17] KB: Ooh, that's really interesting. I want to dig into that, because one of the big things I think people are trying to grapple with right now is what I've been calling progressive disclosure of context, right? It's like, I don't want everything in my context window upfront. One of the challenges with MCP servers is it's hard to do - they've got a whole bunch of different stuff. Skills were a step in the like, okay, we'll give you a little description and then you can load more if you want it. This sounds potentially even more powerful. How do powers work? What are the knobs and levers I have to say, okay, these are the situations in which this context is going to be relevant?
 
[0:21:51] DY: Yeah. To use a power, it's relatively simple. I mean, on the surface, you go to kiro.dev and list the powers that the website has. It's just a starting point and say, add it, and then it'll load up the IDE and download that whole bundle of stuff. The Kiro Powers, I think that you'll find that they're pretty similar to what you're describing with skills of that they have a smaller amount of data that will entice the agent to use it in certain cases, just enough to peak its interest at the right time. Then when it is time to do a thing using that, it'll load up the MCP servers and steering files and everything for that part of the task.
 
I'd say, it's not magic, but it is extremely convenient. I'd say, keeping the context window small, it's convenient for that, but it's really convenient for just bringing in the expertise around a particular technology when it's time for that. I found these features that we build into Kiro, we're building from our own experience and other customers' experience. It's the nice thing about building tools for developers is that we are also developers. It's a thing where it's a little more intuitive to imagine what other customers might want.
 
Just as we were, that's just where Kiro came from in the first place, actually, is we were using LLM-based tools to do development and we found that, okay, it would wander off. We were like, well, let's build specs, or you'd want to do something and it wouldn't know how to do that. Say, "Hey, I want to use this new bedrock agent core feature to build a new agent, or use strands." It's a framework that we created for open-source framework for making agents. I want to use that. Well, at the day of launch of that new service, or feature, the agent has no idea what that is. I can either go and give it a bunch of links to say, "Hey, go read this documentation. Read this documentation. This is what I'm talking about. Here's how to find it." It points it in the right direction and saves a bunch of back and forth of like, "No, I meant this. I meant this." The power just keeps it focused.
 
Just from this experience of having to repeat ourselves to the agent, say, "Hey. No, this is what I'm trying to use right now. Here's how to find out more about it." We found that this powers concept would just help package up expertise, so that the agent will be good at everything.
 
[0:24:05] KB: Absolutely. Well, and I think that is what that progressive disclosure allows you to do, right? Is you can say like, there's all these things that are going to be relevant at some point. Let me give you access, but only when you actually need it. Let's talk a little bit about hooks, because that was another thing that I saw in Kiro that seemed like it was potentially very powerful.
 
[0:24:24] DY: Oh, yeah. Hooks, it's actually another part of this packaging of a Kiro power, but it also makes them on their own. A hook is something that will run like a prompt, or another spin off an agentic loop in reaction to a thing. If I say, let's say I have an API, some kind of web service API, and I want to, well, every time I update my API definition, I might want to generate some things off of that. Just like, specs, you can generate things like property-based test off the specs. API definitions, also, you can generate a lot of interesting things, like SDKs, API documentation. Really nice. Maybe that's a nice time when I save my API. If I never make a change to my API definition, I want to go do these things as a result. I would write that as a hook.
 
It's pretty simple to make them, actually. You just write a prompt that says, "Update my API documentation." Simple. Really, that's it. You would say, trigger on whenever this file is saved, or changed. The different triggers that you can kick off these different hooks, maybe run this code scanner, run this dependency. Every time I update my dependencies file, whatever framework I'm using, I want to check for security vulnerabilities, or something using this tool, or out of date, or if there are more recent versions available that I could grab. It's a nice way to just do those things that I need to remember to do, or that I'm just making it a little more convenient.
 
You can also manually trigger the hooks. I find that's actually how I mostly do it, just personally. Sometimes it's not exactly when a file gets saved that I want to do a thing. I just easily, oh, yeah, I click this button and it's going to go off and do that thing for me.
 
[0:26:02] KB: Now, does that whatever it does - say, you have an agent running on thing and it's going, it's got its own context window is writing things. It touches one of these files that has a hook attached, and the hook runs. Does the output of that hook - well, first of all, if it's a prompt snippet that's getting injected, does that run in the main agent context window? It's a separate context window.
 
[0:26:22] DY: No. Yeah, pops out into a separate context window. Yeah.
 
[0:26:24] KB: Okay. Awesome. Then, does anything from that get fed back into the original context window, so that you could create one of the things that I think is emerging as a pattern with a lot of these is you want to create feedback loops for your agent, so that it can self-correct and linters and tests and all these things give these opportunities. Does that get piped back in some way, or can it?
 
[0:26:44] DY: I don't think it does. I'm pretty sure these are one-off independent tasks that just fork. I haven't done it that way, so I don't think it joins back with the main agent that you spawned it from. Actually, I could be wrong, because it's a neat idea.
 
[0:26:56] KB: Does it have a full agent loop, or is it like a single inference do a thing and come back?
 
[0:27:00] DY: No, it's a full agent loop. It's going to just keep working on a task, just like any other task.
 
[0:27:05] KB: It doesn't necessarily have to go back to the core one, because it could go and just do the fix itself.
 
[0:27:10] DY: That's right. That's right. Yeah. They just all branch off and go kick off these other tasks. I think, yeah, hooks are an interesting nice convenience for remembering to go and do other things.
 
[0:27:20] KB: That pulls us into a world. Now we're talking about, essentially, multi-agent patterns, right? Because a hook at its core, it sounds like, is a small agent. Or maybe not so small agent. It could be a large agent. Who knows? How do you and Kiro think about coordination across those different agents, making sure they don't stomp on each other, etc.?
 
[0:27:38] DY: I think an interesting place that I see more of the agent coordination, so when you're in an IDE, that agent coordination, you're seeing them do the things, and so it's not a huge amount of cognitive flow to see what one is doing and make sure that they're not overlapping, or anything. When you get into this other world of agents that aren't running right in front of you, that this coordination becomes very interesting, this world is actually already here. Recently, we launched a set of what we call frontier agents, which are, they connect into Kiro in this interesting way. With Kiro, we found that with spectrum and development, it could run for longer. It could run independently. You can give it a larger, more ambiguous task and have it go do that, without having to pester you.
 
We decided to push that as far as we could into a set of new software development agents that we call frontier agents. One is we call Kiro autonomous agent. It's a non-IDE agent. It's just you assign it, it meets you where you are, is part of your team. You would say, if I have a, whatever I'm using for my backlog for my team, you assign it a task. It'll go do that task and do its own loop and test and test and refine and refine and test and understand, expand on your more ambiguous task that you gave it to make sure it fits with your team's working patterns and implement that for you. Then produce a code review, like a pull request. Here is the result. Do you want to merge this? 
 
The second of these frontier agents is a DevOps agent. We noticed that when you code, you also need to run that code. That's what I was saying with DevOps. We've encoded that into an autonomous agent that we call AWS DevOps agent. This does incident response. It'll triage and root cause issues and recommend how to fix these issues in production. Over time, it'll actually look at your whole environment, your whole setup to say, "Well, actually, I found these opportunities to optimize your infrastructure, or the way that you even deploy your software," and say, "Hey, this alarm keeps going off. I noticed you're getting this alarm that goes off, because you're doing bad deployments all the time. Let's update your CI/CD pipeline to add better tests and automatic rollback and better alarms," because maybe alarms aren't even catching this early enough.
 
It just tries to prevent future issues by just working all the time in the background to look for opportunities on what to improve. Then we have a security agent. It'll make sure that the code that you and agents write adhere to write security standards and believe and do a penetration testing on its own. The coordination, you asked about coordination, I think that really comes into play across these agents where they're not running in front of your face. These are running all the time. Ideally, they run when you're sleeping, or something, right? Getting deeper into the backlog than you would.
 
The coordination between these tends to actually be the coordination that you use across your team. We make it so that these agents are where you are and your team is. These agents, you interact with them in Slack, or whatever team communication tool you're using in whatever backlog tool you're using, whether that's your, whatever. In the case of the DevOps agent in whatever incident response tool you're using, like ServiceNow or something, whatever observability tool you're using, Dynatrace, or Datadog. These are all about when it comes to coordinating agents, we find it's best to do that coordination where you're already doing coordination with your teammates.
 
[0:31:15] KB: That raises some interesting questions for me. Let's take one of those examples. The frontier agent working off of my backlog, so say it's in Jira, or linear, or something like that, it sees a ticket. Does it then follow the Kiro process of, okay, I'm going to write a spec? Do I then as a human have to take a look at that spec, or how does the in the loop piece of this happen, or does it? Is it completely autonomous? Then it's going to come back and I'm going to look at it only when it's got a complete working set of code that may or may not match my intent, if I had a very poorly specified ticket?
 
[0:31:49] DY: Right. I mean, it has to have that judgment and learn. These agents learn about what your team preferences are and how they work. They learn based on your feedback. You might realize, just like I mentioned in the Kiro spec flow, the Kiro is asking itself whether or not it has enough information to have a well-formed spec, or whether it has conflicting instructions that it needs your help resolving. Similarly, with property-based testing, when it's generating tests and sees a test failure, if it thinks, if Kiro, with the reasoning that we've given it, if it thinks it has the right implementation, but it also thinks it has the right requirement, but yet, the property-based test is failing, it needs help resolving that. It realizes, okay, I have too much ambiguity in this case to be able to have that optimism that LLMs so often have about moving things forward.
 
Similarly, with the autonomous agent, a big part of them is realizing that they have a ticket assigned to them that maybe has some instructions in it that conflict with team practices that it's already learned, that you actually hit, this ticket says, to use this logging library, but where the team is using this other logging library. You might need to ask for clarification before it continues. That's a big part of it is deciding when to re-engage somebody before just burning a bunch of cycles doing something that is the best guess.
 
[0:33:09] KB: Can we peel back the cover a little bit and talk about how that learning works? A couple of different things that I'm curious about. So, one is pure guts implementation, right? Everybody's trying to solve. This is the frontier problem right now with LLM tooling is how do you make it continuously learning, rather than train once and go? There's various approaches floating. I would love to know at least at a high level, what approach you all are taking to it. Then I guess, the other thing that I'm always fascinated with around this is how is that made legible to humans? Because LLMs, as we know, get things wrong. How do you take whatever mechanism that you're using to derive like, these are the practices we use, or what have you, what form is that then bubbled back up to people to say, "You know what? You learned that wrong. That's not correct. Or yeah, this is great. Can we expand on this or what have you?"
 
[0:34:02] DY: Great. I'll give a couple examples about how these agents do their learning and how you see it as a customer of them, a user of them. One is in the DevOps agent, AWS DevOps agent. It needs to understand what we call topology, your whole system in all of your test environments, your production environments, what your infrastructure is, how your code and interfaces overlays on top of that infrastructure, how your CI/CD pipelines push to it, and how you as operators interact with it, and how you observe it, given this part of my application, where are the logs, what provider do I use to keep the traces, whatever.
That topology is a thing that you can see when you create AWS DevOps agent called agent space. Visualize that topology for you, so you can see, here's the universe that we have learned about and discovered so far. I mentioned this, that part of the DevOps agent runs all the time looking for things to improve. While it's doing that, it's also discovering more and telling you more about, hey, I was trying to figure out something. I was trying to access this part of - figure out where your logs are for this thing, and I can't find it. You've told me I should be able to access this, but I can't. It is producing these things that got in the way of it learning more. It's like, hey, you might want to resolve this.
 
As a result of that, we're asking, how does somebody correct that, or do that? Well, they can either, if we are right and we found a misconfiguration, they can reconfigure it. Or if we were looking in the wrong place, they can write, in the case of DevOps agent, we call the runbook, which just, it's essentially a set of steering files that are loaded in the way that progressively loaded. They include short names and short descriptions that can entice the agent and when to look at them and when to consult with them. These runbooks can help it and say, "Okay. No, this is the actually the observability tool that I'm using for this part of my system. You shouldn't be finding logs there anyway. You should be finding logs in this other place instead." That's one way that the agents learn is the topology in the DevOps agent case. You can see that visually in the application.
 
The other one, I guess I could talk about is a similar agent that we have called AWS transform. AWS transform, especially this custom transforms, it's a service that makes it so that you can - it's an agent that helps you do a longer upgrade, or transformation project, or maybe a repetitive one. Let's say, you have the same kind of code transformation, or upgrade, or migration that you need to do, rinse and repeat a lot of times. Maybe you're trying to re-platform, or move to a new framework, or upgrade a new language version, that's a real rinse and repeat thing.
 
Sure, you could give that to Kiro and have it do it every time and you give it some content, the best practices and everything on how to do that, give it a Kiro power specifically for it. Definitely works. We've made transform to be where it learns every time it does a transform of a particular kind and writes down, okay, tips on, oh, this worked really well, this didn't work. You can write your own custom transform for just for you, and it learns just - it's not learning off of others. It's learning off of you, every time you run the transform, your custom transform.
Then the way that it shows you what it's learned is it just shows you a bunch of essentially, learning document - knowledge items, it calls them, which are just essentially markdown files. It's showing you what you learn. Then you say, "Yes, that is correct." Or, "No, that is not correct." You're accepting or not accepting those knowledge items. That's how that's shown to you and how you can control whether or not it's learning useful stuff.
 
[0:37:43] KB: Then, digging into implementation pieces then, are these markdown files that are globally available in the context for this agent? Are they exposed progressively powers are? How do you think about, as you accumulate learnings over time? The transform sounds like it's honestly, very simple. This will have one type of transformation it's doing and learning, but maybe not. Maybe it's got a whole bunch of different things. Or if we're talking about I just frontier coding agent that's learning all of my team's practices, it could accumulate a lot of documents over time. How do you control which the agent is looking at when, or you had this great example that I loved of like, oh, you asked me to use this logging library, but I know that my team uses a different logging library. That's a detail, probably one of many details about coding best practices on this team. How did it find the right one?
 
[0:38:34] DY: It's certainly not a loaded all up every time into the context when everything we've learned progressive disclosure, or resurrection of knowledge is a super important. Aging out of knowledge is important. It's a mix of so many techniques and around from RAG to simple files that are with summaries. There are a lot of other agent loops, having agents that are responsible for doing that reflection. I was describing in the DevOps agent, for example, how it's actually going back every day and looking at the last weeks of issues that it investigated. It's just reflecting on those. It's reflecting on those for things that maybe are - the way that you see it most as a user of it is either seeing recommendations on how to prevent this same recurring issue from happening, or patterns of issues.
 
It's also going and reflecting on whether or not it took the right path in that investigation. Where did it waste time in the investigation? The trick is sometimes, it's good to waste that time. You don't know, next time it might be a different issue. If you don't explore that branch of an incident response, then you miss out on the thing that actually was the problem this time. I guess to your question of how to be organized knowledge, it's a mix of techniques we're always learning ourselves on the best way to do it. we're, it's moving so fast that by the time I'm done with the sentence will probably be have a slightly different take on it.
 
It involves a lot of agents behind the scenes. Ultimately, agents with jobs of learning are pretty good at then reflecting on what might be useful in the future and coming up with the right applicability.
 
[0:40:15] KB: Yeah. No, and I love that example of just having a set of things that are - their job is to look back over particular time periods, maybe look at particular types of aggregates, what have you, infer what they can and then expose it, or make it useful when need be.
 
[0:40:30] DY: It's also important to do that reflection during a task, because the nice thing about background agents, like frontier agents, as we call them, is that they have time. Now during an incident response, we don't have - the idea is to get that figured out as quickly as possible. But when working on a coding task, we have time. We can have a bunch of just introspection points in the agent, where if it's working on something, it can say, okay, let's go look at, let's examine one at a time, or in parallel, actually, all these different elements of what we've learned. That's a neat observation is, so we don't have time to go evaluate every knowledge item we've accumulated, but we can really ask ourselves from a different subject areas, whether we've considered everything.
 
I guess, having the background agent learn, but also having the agent that's doing a thing, know what questions it should be asking of the knowledge and when. When it has more time to do that, it has more time to reflect on learners.
 
[0:41:29] KB: Yeah. I like that. Can you share some of the different perspectives you might have to take? Because I think that is another very interesting thing with these LLM agents, is you can have like, oh, I have five different lenses on which I wish you to evaluate this thing, and then we'll come back and compare.
 
[0:41:45] DY: There's so many different dimensions on this. One that we do make sure that we're always reflecting on is, AWS has this thing we call the well architected framework to make sure that we're building things that follow the learnings over time of what's a good way to build on AWS, or just build systems and operate them in general. We reflect on the security, for example. Am I opening up anything that we shouldn't be opening up, or following the best practices that you've established as a company? Or even things like, reflecting on whether or not we're following the testing practices, the observability practices. Just everything you can imagine of - Particularly if somebody says that something is important. If they say that this is important to them, like make sure you're doing this and should reflect on that to make sure we're doing that thing.
 
[0:42:35] KB: Are there ways for teams, for example, to set up one of these agents to say like, okay, here's a perspective I always care about. In fact, I'm going to give you some additional resources, or things particularly for that perspective?
 
[0:42:46] DY: Yeah, exactly. Connecting knowledge bases, steering files, even writing powers. The one thing we've done, within Amazon, in our use of Kiro, we're back to the IDE at this point, is we have a steering file that gets bundled up. We have a separate VS Code plugin that connects to our internal build system anyway. We've had that for decades in the different generations of IDEs over the years. When it's installed in Kiro, we'll install a default steering file. We've just said, okay, this is something that we always need to have. Make sure you're always following these things. Again, unless a team has overridden it with their own little bit separate way of going about it or whatever.
 
[0:43:27] KB: One thing that you said in passing, but I think is actually interesting to dig into, you mentioned, oh, the best practices for how to do learning, they're changing. At time, the sentences changed, or you're finished, they may have changed. It's funny, but it's also one of the big challenges of our current LLM coding era is things change so incredibly rapidly, it's hard to keep up with what's going on. I'm curious, is there anything you're doing either that Kiro is doing, or that you found effective at Amazon for helping us limited human brains keep up with the changes that are happening in our software stacks?
 
[0:44:06] DY: A lot of what we do is educating each other, sharing what we do. We have long-running, just ways that we propagate and share in practices and stories. We do a lot of storytelling of when I was building this thing and solving this problem, here were the important things that I learned, and maybe you should, too. A lot of these internal talk series, we do a lot of them externally, too, at our learning developer conferences like, re:Invent, do a lot of describing how we build things and how others can learn from what we found that works well, or doesn't work well. A ton of just sharing information with each other in different talk series that we make sure we advertise and market well.
 
People learn different ways. Some people like talk, some people like reading, some people like podcasts. We try to do a lot of different mediums to do that. One thing, pre-AI era, the most recent AI era of generative AI, we released one thing we call the Amazon Builders Library. It's a set of long form articles that describe how we at Amazon do certain things when it comes to writing software, or operating it, or designing, scaling big, large distributed systems and operating them at scale. Just sharing knowledge like that, getting into the nitty-gritty details on what's the right way to implement a health check when you have a server behind a load balancer? Something that seems super easy like, oh, yeah, just respond healthy when you're healthy. Okay, but what does that mean and what are the downsides?
 
Actually, for that one, that's one of the articles I wrote for the Amazon Builders Library. It's a long article, because when a health check is, and something responding to that health check is an automatic system that's going to - can have surprising behavior when it's running unattended. Anyway, so just to your question of how do we keep up with all of the changes in AI, we talk about it a lot. Talk about what we do that works. We try to do that externally for everybody, so that you all can learn what we learn, and so we can learn from you all. We just try to just participate in community of practice about it.
 
Then we try to think of just short cuts for it around things like Kiro Powers, that are going to package up everything that somebody has learned and an opinion that they - especially when a company, or a tool owner, or a framework owner, or platform owner has an opinion about here are the best ways to use my framework, or platform to be successful. I can then share that directly with everybody using Kiro by providing a power that packages up all that experience, that knows all of the power user tricks and tips and everything.
 
Just ways to share knowledge and then ways to share packaging of tools, find that those are the best we can do so far. Then around the learning systems and training in general, in particular, I mean, this is part of why we've been building things like Amazon Nova Forge, which is a service that helps you do model development from early model checkpoints. It's just, there are so many services that we've been launching recently that help you do that. When it comes to different techniques for whether it's building a RAG, or knowledge base, or actually training models. We do not know that that's super important for agents, and so we're doing everything we can to provide services that you can use to just get started with something that learns, like agent core memory. Bedrock Agent Core is a is a service that helps you run right agents. It's something that we use internally, like AWS DevOps agents uses agent core, because it's a handy way to build and operate and scale agents in a secure way, isolation between tenants and everything.
 
One of the features of agent core is memory. It will look at the traces of the agent, all the agent and tool interactions and has different strategies that are available out of the box to just compress that knowledge to say, okay, here's what I should store for later for session-wise of just how do I reflect on a particular agent run? Then, how do I promote stuff to long term memory where I can have long term lessons distilled and available afterwards? Agent core memory is one place that we've been trying to build that. How do agents learn? I know your question was more how do humans learn about how to use these things, but agents need to learn, too, so we need to be giving as many building blocks as we can. Basically, that complex task of training and learning can at least give everybody a head start who hasn't tried it before.
 
[0:48:39] KB: On that note, you're trying to push forward this frontier and make it easier for folks. What do you see where the frontier is in these agentic coding tools and building agents and this whole space and what's coming over the next, I don't know how far we can project now, but three months, six months, something like that?
 
[0:48:57] DY: I guess, it's funny. Everybody's going to have their own slant on this from their prior expertise. One thing that I see of challenges that fit with my own background, it's tricky to tell whether agents are doing things successfully, how effective is an agent, the thing that you've built the agent to do. It's something we sink a lot of time into when we build our own agents, and it's something that we've been trying to build primitives to make even easier. We've been trying to make observability and evaluation of agent trajectories more automatic and easier. It's funny how some techniques that, I'd say, when it comes to agentic applications, some of the techniques that were maybe not as exciting, because they're more subjective for say, like website. Are your website users happy? Actually, tricky. A little more annoying than measuring is my API of returning success, or failure. Okay, it's pretty easier signal. Still complex. But my website customer's happy is a tricky question to answer.
 
You can infer it and there are a bunch of techniques around, say like, okay, is the objective to - it depends on the domain. If you have an e-commerce site, are people buying things successfully? If not, there's probably a problem on the site and people probably aren't happy for some reason. You have to have a domain specific, just like business outcome. Same with agents. It's not an API. These are things that users are using to drive a certain business outcome. That's tricky. Some things that we see around agents are humans entering a thumbs up, thumbs down, which is a useful signal. It helps. But then, is the user really going to provide a really detailed description of why they weren't happy? We don't want to overcorrect on that.
 
I'm never a huge fan of the, I mean, it's important and necessary, but I don't want to just rely on the thumbs up, thumbs down. It's the airport bathroom cleanliness button, as you leave the bathroom. It's like, okay, is it clean or not? I don't like that. In fact, I liken it to that, because it is sort of people are wrinkled and knows the idea of that. It's like, I potentially give a gross comparison. Because I think what we need to be doing is figuring this out more naturally. What is the objective that people have with an agent and how close is it to achieving that objective and where it didn't achieve it? Why?
 
This evaluation and learning are both very related about how do you evaluate whether or not this is succeeding for customers. Then when it did, how do you make sure that we are good at that next time, if there's a new discovery, and if we didn't, that we don't go down that route next time? Unless, it's just a situational thing, where maybe an operational investigation, and maybe we do need to still check to see if it was a bad deployment that triggered the event, even if it wasn't a bad deployment this time.
 
[0:51:39] KB: I love that. I recently set a challenge to my team. I said, one of our goals for this product is should be delightful. You have to figure out how we measure that, right? It's like, these things are so nuanced and subjective now.
 
[0:51:52] DY: Yeah. It's a nice thing where we can borrow the page from website and mobile device, mobile application monitoring, like a real user monitoring, and sort of this rum industry called - I'd say, it was certainly appreciated by people who are doing website operations and development and mobile app website development. I think the rest of the industry might not have really appreciated the need for that. Maybe because it didn't apply to what they were doing so much. But I mean, it could have, but now it brings that technique into the forefront. But with new technology to be able to do real user monitoring better now.
 
Yeah, it's interesting how it just - it's a nice shift toward understanding your customers happy. It's one of always drawn me to observability, because I used to work on observability on CloudWatch of observability is a lens with which to understand your customer's happiness. So, I think that's very true with agents.
 
[0:52:47] KB: we're coming to the end of our time at this point. Is there anything we haven't talked about that you think would be important to leave folks with?
 
[0:52:54] DY: I'd say, when looking at coding tools that help you with, like that's a really great starting point into rethinking how you're building an operating software. But look at, I guess one, when you start with a tool, especially if you're new to AI coding tools, you're not going to get what you want from it on the first try. It's like, when search engines came out - this is a little bit of a story time thing, but when search engines came out, I remember some people, you weren't just magically good at using search engines. You had to use the right syntax. Maybe the search engines were a little primitive when they first came out, too. You had to learn how to use Boolean expressions to be able to filter out things that were uninteresting to you. The more you did searching and the more you really studied, okay, why didn't I - in introspect, why didn't I get what I was looking for from that search? The better you get at using the tool.
 
Of course, the tools get better over time that you don't need the expertise on, but they also - like introspecting, well, why didn't I get what I want? I see that the more that people think about that, the more successful they are with a coding tool, a coding agent, for any coding agent, like Kiro. Because maybe, well, why wasn't it able to use this new framework that I made five minutes ago? Okay, well, you need to remind it, too. Okay, so that was steering is for, that's what powers are for, that's what MCP servers are for. Just think about, well, why didn't I get what I wanted? Because all the people are getting that.
 
We found that a team did a 30 people for 18-month level of re-platforming of a service. 30 people for 18 months were able to get that done in with six people in six weeks. I'm just citing the statistic, but it's just the significantly shorter time when the tools are really using this. My advice for people is these tools are extremely powerful, but they take you learning how to use it and incorporating it into your development practice and then changing your development practices. Because these tools can accelerate coding so much that they force the need to change the rest of the practices around the coding, it doesn't become the bottleneck anymore. This is why then we've built these frontier agents to handle things beyond the coding, because we don't want to just shift the bottleneck.
 
When you're doing a bunch of production grade coding, you also need production grade security, pen testing and review. You also need production grade operations. You need to accelerate all these things at once. Look beyond. Basically, distilling this into two things is keep using tools and ask yourself, well, why didn't I get what I wanted? Because other people are. Why didn't you get what you wanted out? There's maybe something, some trick of using a tool in a better way. Then second is look beyond that one tool, and look for the other bottlenecks that you can speed up and make your life easier with, like around DevOps and security.
 
[0:55:46] KB: Yeah, absolutely. Well, and I think the stuff we've talked about today, all these things, Kiro was baking in some of the practices that six months ago you had to learn, right? I have to learn prompt the thing for a spec. There it goes. I love the property-based testing, because it does fit into this idea that I've been playing with a lot of forcing the LLM out of its groove. Instead of here, confirm your - jump down this confirmation bias loop. It's go across all of those things. As we talked about, that concept of what needs to be verified is just baked in there. We did check while we were talking, like there are property-based testing libraries for all sorts of different environments, and we can just throw them. Sounds like Kiro should be able to just generate tests in whatever environment. Super cool. I know we've been seeing incredible speedups. I mean, it sounds like, yeah, Amazon, you guys are seeing incredible speedups. Excited to see where this goes.
 
[0:56:39] DY: Yeah, same. It's a new frontier. It's the next big speed up in making developers' lives easier, in this case, a whole lot.
 
[END]