EPISODE 1879

[INTRODUCTION]

[0:00:01] ANNOUNCER: A common challenge in software development is creating and maintaining robust development environments. The rise of AI agents has amplified this complexity by adding new demands around permission controls, environment isolation, and resource management. Ona is a platform for AI native software development and engineering agents. The platform combines autonomous agents with secure, standardized environments with a focus on giving enterprises control, security, and productivity so they can scale AI native engineering without scaling risk.

Chris Weichel has more than two decades of experience spanning software engineering and human-computer interaction. He is currently the Chief Technology Officer at Ona, formerly Gitpod, where he leads the engineering team behind the company's cloud native development platform. Chris joins the podcast with Kevin Ball to talk about Ona, the impact of coding with parallel agents, the future of IDEs, choosing agent-friendly languages, code review as a new bottleneck in the software development lifecycle, and much more.

Kevin Ball, or KBall, is the Vice President of Engineering at Mento and an independent coach for engineers and engineering leaders. He co-founded and served as CTO for two companies, founded the San Diego JavaScript Meetup, and organizes the AI in Action discussion group through latent space. Check out the show notes to follow KBall on Twitter, or LinkedIn, or visit his website, kball.llc.

[INTERVIEW]

[0:01:43] KB: Chris, welcome to the show.

[0:01:45] CW: Kevin, thank you for having me.

[0:01:46] KB: Yeah, excited to dig in. Let's maybe start with you a little bit. Can you give us the TLDR on your background, how you got to where you are today, and a little bit about Ona?

[0:01:55] CW: Yeah. My name is Chris. I'm the CTO and co-founder of Ona. How we got here is I've been writing software, basically since I can read. That's a very long time. I've been doing that professionally for more than 25 years at this point. Throughout my entire career, I've been in essentially, dev tools and tooling space, a lot in automotive, large enterprises. At some point, did a PhD in human-computer interaction on tooling for digital fabrication as it was. Ona really, at this point, is the culmination of all these years and trying to solve a problem that I've seen show up repeatedly. With AI, we got a whole set of tools to help solve things that we couldn't even have dreamed of just a few years ago.

[0:02:46] KB: Let's talk a little bit about that problem set. What is the core problem driving Ona? What are you solving for folks?

[0:02:54] CW: Very fundamentally, it is reducing the time between having an idea and making it reality. I mean, that's the job of any good tool. What we started, what we came from is essentially, Ona environments. It's the idea that if you want to write software, certainly in a professional context, you'll be spending a lot of time setting up a dev environment. Instead of writing code, you'll be faffing with Pip and Node and NBM and what have you, trying to set up your dev environment. That's the first thing that we tackled and something that we spend a lot of time solving, how to make that work well and how to make that work well, also, for larger organizations. Then building on top of that, all the primitives that we built turns out are extremely useful, if not necessary, to run agents. Ona, really, is the mission control for software engineering agents.

[0:03:49] KB: Yeah, I remember running into you guys when you were purely doing the cloud environment before agents were a thing. Already, it was like, "Oh, yeah, that's brilliant," right? Go to my GitHub repo and not just fork the code, but get a dev environment that works right there. Let's spell out a little bit of the implications there for agents, because I know, I try to run a lot of different agent things, but then I'm managing environments and copying envarsies around and doing all this sort of thing. What is it that you do for someone who's running an energetic environment?

[0:04:19] CW: This plays out on different levels. First, there is instantiating a dev environment for a particular project, typically means, one, getting a set of compute resources. If it's your laptop, it's your laptop, and that only has so much RAM and so much disk to go with much, and so much network bandwidth. Then, it's also the potentially different tools. If you're working on one in the same project, chances are you have a machine set up for that. Then, may be different configuration, different keys or something that you need to run multiple instances of that. It's definitely different working copies, potentially work trees is what we see a lot of folks do locally. All of that you now need to manage manually. At this point, there are heaps of tools trying to do that for you on your laptop. Very fundamentally, it's still bound to your laptop. There's a resource limitation and there's a variance limitation that one single machine can handle.

The other is running it on your machine really - on one single machine really limits the autonomy you can grant that agent. There's a reason code calls dangerously skip permissions. They put that dangerously prefix there. Do I want an agent to believe it needs to reset the state and /delete the root of my file system? Absolutely not. The thing that we do is we talk about this as Ona environments, agents, and guardrails. Environments, really, are the development environments that I just spoke about, that we can instantiate many, many, many of, that humans and agents can interact with alike. Then there is Ona agent that lives in these environments, does autonomous work for you, and then as guardrails, that gives you control over what Ona agent can do. That's in a nutshell.

[0:05:58] KB: That makes a lot of sense. I feel like, I'm constantly walking some of these balancing acts locally, where I want an agent to be able to, for example, autonomously run tests, but those have to touch a database. Now, I have to give it network access and who the heck knows what it's doing with that and all these different pieces. Okay, how does this work? If I want to run an agent with Ona, what type of configuration do I need to do? How do I get that set up?

[0:06:26] CW: The cool thing is that agents are very, very useful for getting stuff set up. The fundamental primitive is essentially, a compute box that it's essentially def container in NVM if we strip it down all the way, that has code checked out already. In practice, the way this would look like is you come to Ona, you select the repository you want to work with, and put in a prompt of what it is you want to do. We obviously give you good set of defaults. One of them is how you can configure it an in environment. Then, it's going to go and do that.

Really, the baseline is what you're used to with most AI tools, where you can put something in a prompt box and it's going to go and do that thing, the difference here is that you can do five of them in parallel and the fans of your laptop don't spend up. If one of them goes astray, you just delete it and create a new one.

[0:07:18] KB: That makes sense. Okay, so let's look through a little bit of those implications then. It's now trivial for me to spin up agents, which is useful. I find personally on my laptop, which I'm still on, I start capping myself at like, I don't know, three or four is about how many I can keep track of. If it's now essentially free to spin up additional agents, they each get their own environment, they're not messing with each other, they're not messing with each other's branches, how do you deal with that level of stuff going on?

[0:07:50] CW: Yeah. That's the thing I as - with my CI background, the thing I find really, really interesting is how do we as humans come into this? If we take a step back and really look at software engineering as an industry and as a trade, the way we've operated for the longest time, certainly, since I can remember is this deep mono focus work, where you focus on one thing at a time and we value this deep flow state. I mean, heck, we even tie our identities to this as like, having that environment is perfectly tuned to your own liking and keyboard shortcuts and your IDE is your home, your sanctuary almost. That's how we've been operating for a very long time.

It feels a bit, and for me, also felt a bit odd that, okay, now I'm asked to give this up. Agents get more and more autonomous. The only way we can turn this autonomy into productivity is by doing multiple things at the same time, which is the opposite of this deep focus work. As a result, the interfaces that we interact with need to change. The IDE is built for a world, where every line is artistically handcrafted, say, for autocomplete and code generation. Now, we have these machines that can write code for us, many in parallel, exactly as you say.

Now, we need interfaces that help us find joy and flow in this parallelism, as a new class of interfaces. We all haven't figured this out yet. We're still trying to understand what this is. There are many ideas being brought forth. We obviously have our own take and spin on that. What I will say is the second key trade that we observe in software engineers is that we all have a mind that leans towards addiction. We're gamblers at heart. Where like, okay, this next change is going to fix my test. This next change is going to make it work. Just one more change. All of a sudden, it's 2.00 at night again. That's how we work.

[0:09:49] KB: I'm in this picture and I don't like it.

[0:09:53] CW: Exactly. What agents do, it's a bit like playing a one-arm banded. It's like a slot machine in a way. What agents do is they've made it incredibly cheap to pull that lever, and you can play five of them at once. Not only do you - you can actual find flow in this parallelism with the right interfaces, it's even addictive.

[0:10:13] KB: Oh, it's deeply addictive. The ease of building your own tooling has just gone through the roof, right? It's like, oh, why not spend another five minutes building myself another tool?

[0:10:24] CW: I know, right? There's also a lot of interesting implications for startups there, because I would argue that so many startups were founded from, "I have this problem. I'm sure others have this problem. Let's see if I can go and sell it." Now, you don't need to offset the cost of producing the tool anymore, because it has become so cheap to produce it. Chances are you're not going to bother generalizing it and try and find someone to sell it to, because it already solves your problem.

[0:10:46] KB: Let's maybe go one step deeper. You highlighted, there's a set of interface changes that need to be made. Are there also mindset shifts that need to happen?

[0:10:56] CW: I think so. What we observe, also in our own team is that there's a mindset gradient that aligns with seniority. The more junior someone is, this is my - it's not like N equals less than 50. Take that with a grain of salt. The more junior someone is, the more they identify and love the game of writing the code itself. It's less about the overall problem, the business problem they're trying to solve. It's more about, "Okay, can I write this code? Can I figure out a really elegant way to express that, or something like that?" The more senior folks are, again, small sample size, the more they value solving the problem. The more you value solving the problem, the less the way you solve it is relevant.

Obviously, you don't want to impose a lot of debt on you and get called out in the middle of the night, because the code's not great. In reality, it doesn't have to be you writing the code. The mind shift that needs to happen is the identification now as a software engineer is more about solving the problem, less about writing the code.

[0:12:03] KB: What do you think are the user interfaces, coming back to the HCI, that are going to elevate that problem domain over the code domain? Because if you think about our generations of tooling, they aren't focused around code. You have the IDE, you have the code review tools, you have all these different pieces that are down at that low level of granularity.

[0:12:22] CW: Absolutely. My hot take is code and the languages that we have today will be with us for a very long time. Every once in a while, you hear folks going like, "Yeah, there'll be AI-first programming languages and that humans don't necessarily understand." Maybe that is so. If you look at the history of programming languages, the trend indicates the other way. We've come from very close to the machine languages, and we've subsequently added more abstraction and moved away from the machine level, struggled to see why we would now take a hard turn. There's so, so, so much code out there in programming languages that date many tens of years back. I think we're going to have that with us.

With that, we're also going to have those tools with us. Those tools themselves aren't going to go away. Some predict the end of the IDE. I don't actually believe that to be true. I think we will have IDEs gone forward. The way they look like will need to change. But fundamentally, interfaces towards tools will exist. An important consequence of that is that whatever system we have that lets us interact with agents needs to take that into account. Many times, it's so much quicker to go and change that hex code to the color I want, than to try and tell an agent how to do it. Why should I be limited to a prompt box, when there's a so much richer interface and system and ecology out there that would let me do that?

[0:13:50] KB: Yeah, I think to your point, it comes down to precision. What level of precision are you wanting to engage in? The equivalent of a scalpel is editing the code directly right here. I can say exactly what I want it to be. Whereas, when I'm interacting with an agent, it feels like, I'm operating at a much higher level of abstraction, which sometimes is fine, but means the details get quite fuzzy.

[0:14:15] CW: Absolutely. I think there's a great analogy. It's a bit like, trying to trim your hedges. You're not going to use a scalpel to do that. You're going to use a hedge trimmer to do that. That's the agent. But if you're trying to sort out those two, three pesky bits that you didn't quite catch, trying to use a hedge trimmer is a pain, the scalpel might just be the right tool for the job. There is a different degree of, let's say, engagement with the problem, depending on what is you're looking to solve.

[0:14:42] KB: Let's go back a little bit to putting these things in the cloud and how that engages it. I think one of the things I've seen in the IDE version of this is when you have a hammer, everything looks like a nail. I've seen people who, once they get into the, "Oh, I'm prompting an agent," change that would have taken them 10 seconds before, because they understood the code, they're like, "I'm going to tell this agent to do it and keep going." I feel like that becomes even more the case the further you get away from, "This is my code." It's off in the cloud somewhere. My agent's handling it. How do you facilitate that sort of layers of tool use, the layer from, I've got my hedge trimmer, but sometimes I need a scalpel in an environment that is set up for ephemerality, just have an agent go and do it?

[0:15:28] CW: I think this comes down to separating the interaction and also, automation that surrounds this from where compute lives. Just because my tools now and my source code live in this VM somewhere on AWS, or GCP, doesn't mean the way I engage with that code and those tools needs to be different. As an engineer, when I connect to one of these environments, it doesn't feel much different. If I use a desktop IDE, say, say cursive, use code reference, what have you, and connect to that environment, it feels very much like it's local, except the bandwidth is much, much more, like a much more bandwidth available.

The way you interact with it really doesn't differ much, except that now, you can have an agent do work even when your laptop is closed. You're no longer bound to the lifecycle of that one machine that right now lives on some shoddy Starbucks Wi-Fi that might be disconnected at any moment. Really, what it does is it lifts you off the limitations of a laptop, but it doesn't very fundamentally change how you engage with the code.

[0:16:35] KB: That makes sense. Well, and it reminds me of, like, I was trying to figure out. I'm still running everything on my laptop, to be fair. I'm not on Ona yet. Though, who knows after this conversation? But I was trying to figure out, can I set up Tailscale on my laptop and my phone, so I can nudge my agents along while I'm out on my run, or what have you? In Ona, what would I need to do to say, drive an agent from my run?

[0:16:59] CW: Literally, you would go to ona.com, you would log in and you would talk to your agent right there and then. That works very well from the phone. We put a lot of effort into the mobile experience. The story I like to tell about this is I have a four-months-old son, and spend a lot of evenings with him falling asleep on my left arm, and then I'm sat there and I'm frankly too scared to put him down, so I'll just be sitting there. It means, I can't use a laptop in that time either, but I can use my phone. I've spent plenty of night with my son on one arm and my phone on the other hand, frankly being quite productive.

A lot of ideas that come into depth the night, otherwise would have been maybe a note in some tool, now they're a prototype. The next morning when I log in again, I actually have working code, instead of some half-stumbled words in reflect.

[0:17:49] KB: Agents are great for turning half-stumbled words into working code. Okay, so let's go back a little bit to implications then, right? You can now generate all this code as you're holding your son asleep, or I can generate it on my run. The cost of software development is now substantially lower in a lot of ways. What happens?

[0:18:17] CW: There are interesting implications of all this. One, I'm not yet sure that the cost of software, if you look at the entire SDLC is actually lower. I think the cost of software production, of change production is lower. That has downstream effects that I don't think we fully understand yet. The one that's immediately obvious is that we're now turning code review into a hotspot, because we're producing so, so, so much code. Reviewing is still very hot and is still a very manual human attention intensive activity. I mean, the cynical take on this would be, we were all promised, we now get to do the creative stuff. In reality, we've been reduced to line workers who now just review code, which isn't the most enjoyable thing to do. I think what's fundamentally happening is the economics and the way we scale software production changes, but I don't think we fully understand the entire SDLC and its effects yet.

[0:19:14] KB: I've definitely been feeling that hotspot on the code review side. It definitely feels like that is one of the bottlenecks. What else do you think changes in terms of the life cycle from have an idea to this thing is actually production ready?

[0:19:30] CW: I think this depends a whole lot on the context in which it happens. As a weekend warrior, someone who's building tools for myself, the cost of producing something really has gone down so dramatically that sometimes then I'm trying to search for prior [inaudible 0:19:44]. They'll fully just prompt an agent and get it done. It's cheaper to build something tailor-made that needs to work once or twice for myself, than to go and try and adapt something out there. In a business, especially in a large organization, especially in a regulated industry, I'm not sure that's true. Again, we've brought the cost of production down. I'm not yet sure we've brought the cost of deployment operations down.

[0:20:11] KB: That's fascinating, because it continues a shift that we've been seeing in the industry for a while, which is the cost of getting started has dropped dramatically. This started with things like SaaS, right? Or hosted environments. I no longer have to outlay 5 million dollars to set up my own server environment. I can get started with 20 bucks a month on Amazon, or what have you, continuing that. Starting becomes almost free. Maybe I have my ChatGPT subscription, $20. Now, my agent can build my software. I can deploy it on Amazon for another $20. I'm off to the races. To your point, scaling, dealing with security, dealing with privacy, all of those are just as expensive, if not more so.

[0:20:55] CW: Yeah, it's interesting to see how that - if we're introducing new breaking points, or step functions in the cost utility function over scale, it does feel like we - that slope is getting flatter. It's easier to get started and it's easier to scale. I think the effects, though, of that acceleration just aren't equally distributive, if that makes sense. There's a factor here, it's a function of organizational size and impact of compliance and regulations and standards you need to follow.

That said, a lot of our customers are in the regulated industries, finance, pharma, massive organizations, Fortune 500 companies. What we see is that they see massive benefits from agents. It's largely fueled by the emerging complexity that exists in these systems. One use case that we see a lot is folks using agents to understand their own systems, simply because there's so much complexity, and so few people who can hold it all in their head, having an agent who can crawl through that is immensely useful. Even if bringing something into actual production, downstream might still be quite evolved in that kind of environment. We're seeing a lot of use cases that aren't necessarily the production of code, where agents are immensely valuable and still bring the cost of the overall process down.

[0:22:23] KB: Yeah, that makes sense. That maybe actually goes into a domain that might be interesting to explore, which is the different ways people interact with AI agents, right? Obviously, this is a shift. The entire industry is going through. Though, there are still trailing adopters. Not everybody is into the agentic world. What are the different ways that you have found people are wanting to engage in these? You just mentioned one, which is explain to me my code base, explain to me this system. What else shows up?

[0:22:55] CW: Yeah. First, I think the point you made there is really quite astute. One observation that we repeatedly have is that there's almost two worlds. It's not quite binary, but it's definitely a spectrum. There is the top 1%, the folks listening to this podcast, who are really hooked in, who know their stuff, West Coast, East Coast, that kind of thing. Then there's the rest of the world. Many of whom, frankly still believe that tap-tap autocomplete is the pinnacle of AI and software engineering today. Don't get me wrong, tap-tap autocomplete, amazing, killer feature. We've moved a step further than that at this point.

I think the old adage of the future is already here, it's just not equally distributed is more true now than it's been before, I think. There's so many organizations. I think as an industry, we generally underestimate how many folks are on the laggard side, which is fantastic. It means there is so much potential still.

[0:23:51] KB: 100%. There's tremendous opportunity purely in scaling out what already exists.

[0:23:58] CW: In terms of how we've seen people interact with agents, we see it roughly along three broad categories. One is of inquiry style work. One we just touched on, which is, hey, tell me how this code base works. There's a special version of this, which is essentially onboarding, where AI becomes the onboarding body for a new engineer, who joins a team or an organization. There's also checks for compliance. For example, hey, how compliant are we with this in that coding outline?

Generally, inquiry, we ourselves use it a lot for design work, where we write design docs together with Ona agents. Essentially, we have /commands, global /commands, so we have our own for a design doc. The great thing is that it is very rooted and grounded in the actual code base, while you still have an interactive conversation. It's almost like an interactive rubber duck that also can read your code base that helps you write good design docs. That's one class is the inquiry side.

The second one is essentially, your classic code change group, which first and foremost, encompasses all the toil. Updating libraries to mitigate CVEs, migrating to a new version of this and that, adjusting to a new standard you want to drive, whatever it is, generally lift and shift type functionality. That's something that we see a lot. That's really a lot of the work that's happening. Then, of course, there's new feature work. There's also, eventually getting things off the ground. Then the third class is really trying things out. The cost of experimentation has gone down so much, especially if you can scale out your agent, dare I say, agentic resources horizontally, which you can with Ona agent. Of course, you can have so many environments, so many agents running in there, so the cost of doing these explorations has gone down. You can explore multiple paths at the same time.

We prototype a lot more using Ona and Ona agent than we do using Figma. We still use Figma, of course, but so many of the ideas that go in there really have been created through Ona agent before. Those are the three classes we see a lot; inquiry, actual production of change and prototyping ideation.

[0:26:09] KB: Let's dive into each one of those a little bit. For inquiry, I think this is potentially a really nice entree for folks who are on that trailing edge, because you don't have to trust the agent yet to write code. You can say, "Hey, explain this to me." Then you can go and actually confirm the thing that was explained. Or, find this for me, or those sorts of things. Do you have any best practices you all have come to internally? For example, if you're doing that design doc, how are you prompting the thing? What are you loading as context? How do you approach this thing?

[0:26:42] CW: The thing that we load is one, a template for the design doc that we want out of it. It's a very simple template. Then very explicit instructions to engage in a conversational style. We literally ask it, ask me three rounds of questions, three questions each. With additional instructions, whenever you're not sure, ask. We heavily try and counteract this excessive confidence that a lot of models have. With that, it really becomes literally a conversation.

What's also working really well is using something like whisper flow, or super whisper to have those conversations, so no one's typing that anymore. It's literally just talking to a microphone. The way the flow looks like is you use that /command that contains the template and the instructions. Also, hey, go and look at the APIs, go and look at the database schema. We have a set of engineering principles where we, for example, value consistency and integrity and pragmatism change over excessive abstraction. Those are part of the prompt, too.

You load all that, you run whisper flow, you put it that way, brain barf into a microphone for five minutes straight, and you put all that. Then the thing is going to turn on it, it's going to investigate the API and schema and what have you, and comes back with a set of questions. You do that a bunch of rounds. The outcome is a 80%, 90% version of a design doc.

[0:28:12] KB: A few different pieces of that that I think are worth highlighting, right? You're not starting from scratch. You actually have a tremendous amount of things that you've put in there. I'm guessing, you've actually engineered and iterated on those as well on the template, on the guidelines that you give it on how many - the ways in which you guide it towards this.

[0:28:31] CW: Surprisingly little, to be honest. The templates is the same template we've been using when AI was a research field with narrow applications. The prompt itself, the three rounds, or three questions is entirely arbitrary. We frankly haven't iterated much on, okay, use different numbers, or any of that. That's just because it's been working well enough.

[0:28:54] KB: Yeah, fair enough. From that design doc, then, do you just hand that to another agent and say, build it? What does that look like?

[0:29:03] CW: Another agent, or the same agent, you could literally go, great, you have the design doc, amazing. Please, put that into notion, so we have full MCP support. It's going to put into notion straight away. Or if it's in a linear issue and link there then you go, "Hey, here's a linear issue. Go do your thing." That, more often than not, is a really, really helpful and starting point. Fundamentally, what we find is trying to get 100% there with the agent, there's diminishing returns. It's the hedge streaming versus scalpel, or the metaphor that we commonly use, it's the difference between highway and city driving. Making miles, highway driving, I don't need to be at the steering wheel. It's fine if my car does the driving. But if I'm in a narrow city, where there are a lot of things happening, and where I really need to make sure I get to the right number, or the right address, probably, I want to be at the steering wheel. I mean, there are also fantastic organizations that have solved that, obviously.

[0:29:59] KB: I was in a Waymo the other day. It's absolutely science fiction. It's amazing.

[0:30:04] CW: It's incredible. Here are fundamental models. One mental model that we found very helpful is actually a metric that Waymo optimized for in the early days. I don't know if they still do that, is time between disengagement. It's the time between the car disengaging and the human having to take over the steering wheel. Seconds between disengagements is essentially lane assist. Minutes, hours is the backseat of a Waymo. In software engineering, we are going through the exact same transition. Lane assist is your tap-tap autocomplete, co-pilot, cursor, and then the backseat of a Waymo is what we're heading with agents.

[0:30:38] KB: Yeah. I think that's a really nice metaphor. The question then becomes, when should a human get involved? What level of engagement do you want? Then additionally, if your agent has gone off for a three-hour journey, how do you boot your brain up on what are all the changes that it made?

[0:30:59] CW: Yeah. This is where the interfaces come in, right? This is where we need to build systems that make that really, really easy also to do that context switch. The way we do that in Ona is that, for one, we have feet forward. We're very early in the process. We, similar to cloud code, essentially, produce a set of to-dos, and they guide the agent and the human. To the human, it's really, really helpful, because it gives you trust that the agent's going to do the right thing. Then throughout the interface, we've been very careful in how we display these and how we use them to tell you, "Okay, this needs my attention now, or in half an hour." There's that feet forward.

There's also the feedback of what has it done, the summary at the end. But also, the ability to jump into a full IDE right there in the same context that gives you a diff on the changes that have happened. We've designed, and where we continue to iterate on the conversation itself in a bit to make it easier to reenter and switch back and reconstitute that state in your brain. At the same time, what we find incredibly effective is literally reviewing code changes since almost the last time you looked. For that, quite frankly, Git is very effective. We don't have to reinvent the wheel. Literally looking at Git diffs and VSCode right next to the conversation embedded in the same browser tab works really, really well.

[0:32:24] KB: Well, having the conversation there is really helpful, and the history and understanding it. Do you link between the two, so it's like, "Oh, here's what it was thinking it when it made this diff and here"? Can you see that sequence?

[0:32:36] CW: Yeah. You can go from the conversation. You can go right into the file. For example, it edited a file. You can click on that and it's going to open it up in the IDE and it's going to bring up that section there and then, and so you get the full context of how it looks like now. We have experimented with historical context, but it turns out it's actually confusing. It's more confusing than it's helpful. Because if you want to see the evolution, it's easier to download the entire picture for a small enough change, right? If we're talking 20,000 lines, I mean, if you're making 20,000-line changes, then good luck.

[0:33:13] KB: This does highlight, right, as you increase the time between disengagements. Most likely, that's because you're making larger change sets. How do you navigate that, right? Suddenly, are you getting benefits? Are you deferring the cost to now, I have a 20,000-line PR to review, which, by the way, I've reviewed far too many of those since we got into this agentic world.

[0:33:35] CW: I merged one a day. I don't think that there's a one-size-fits-all answer. For example, the PR emerged today. Essentially, it introduced three new database entities, including their API. Or API is very consistent. It's essentially crud on entities. That kind of code is very reviewable, especially when structured into good commits. A 20K change might be okay now, every once in a while.

Fundamentally, I think what you're doing is exactly what you point out is, you're pushing the cost downstream, because it's now the poor SAP, or the poor folks who have to review that change that have to bear the burden. What this points to, I think, is agents are tools at the end of the day. They're very close to magic, but they are tools. Much like any other tool, you need to learn how to wield it. I mean, a very long time ago, writing implements were very close to magic, and you need to learn how to use it. The same thing is true for agents. Part of what it takes to learn how to use an agent, I think, is understanding what size of problem it's capable of handling and how you decompose the thing you want to achieve to fit that size. That's a skill. It's a learned skill. It's not something you wake up one day and you have it. It's something that requires experimentation.

Here, we come back to the benefit of running agents in a system like Ona, because the cost of that experimentation is just much lower. You can try different sizes of this decomposition in parallel and observe which one works and which one doesn't, and the one that didn't work, you just throw away. No, I'm done.

[0:35:21] KB: Yeah, I love that. I think that is a really key thing here, which is this is changing the way we do software development. There are skills to learn associated with that. It's not a drop into your IDE and suddenly, you're 30%, or 100% or whatever faster. You have to shift the way you're thinking about developing software.

[0:35:41] CW: Absolutely. We can see that in our own statistics. From one, we see a very strong correlation between PR throughput, so PRs merged, and Ona contributions. We closely track how much of our own code is produced through Ona. We see a very direct correlation. The effect that we see is that there are - the more senior the folks, the more likely they are to be able to break this down and to adapt to this way of working. That's what we see right now.

It goes to your point, it's a tool that you need to learn how to use. The skills that are needed, the other thing that you need to learn is good specification. A key challenge that we see also with our customers is under specification. Agents are fantastic, but they're not magic. They can't read your mind for better or worse. So, you need to learn how to prompt them well. I mean, the old thing is still true. It's less true now. I think, we all remember in early days, like a year ago, the arcane prompting and prompting libraries and whatnot and all the tips and tricks and all caps, you must blah, blah, blah, and all that fun stuff. That's obviously much less true today. But still, there is an acquired skill, the prompting well and decomposing a problem.

[0:36:58] KB: Well, and I think some of what you've already described of your process of start with a design doc. Sure, the agent can help with that. But that gives you an artifact, where you can look at decomposition, you can look at approach, you can look at all these different things and have a feedback loop, before you set this thing loose for an hour and a half to generate 20 lines of who knows what.

[0:37:18] CW: Absolutely. The other thing that we find is also with change sets of any size, but specifically, the larger they get, the more deterministic control mechanisms you have that help the agent understand if it's doing the right thing, the more likely you'll get a good outcome.

[0:37:33] KB: This is a really good point. I think, also connects to your time between disengagements, right? The more you have a feedback loop, the more you are able to validate the outcome of the thing, the better it can do things. If it's a human in the loop for all of your validation, you can't let it go very far before you validate. But if you can deterministically validate a whole swath of things, suddenly, it can spin on its own. What are the guardrails that you put in place for that?

[0:38:00] CW: We have a bunch of mechanisms where, for example, we hook into our CI system and we have prompts. It's literally just a prompt where the agent behind the scenes will do a sleep 360, or something. It's going to sleep a while, wait until the CI system is done a bunch and then go and look at the logs. It's a bit like how a human would operate. I don't necessarily get a ping when my CI system is done, but I'll check back every once in a while. We found that the same, very simplistic mechanisms also work for agents.

The other thing is really good engineering practice now, I think, really pays off. Having a well set up linter, using a language that favors standardization. I'll make no secrets, we use Go and we use Go for good reason. It's a very opinionated language. Not everyone agrees with the opinion, but everyone likes that there is one. This turns out to be very, very helpful. Consistency turns out to be very, very helpful. The mental model that I've come to apply here is agents in a way are like a jet engine, that he can strap on to your plane. Either your air frame is rigid enough to withstand the acceleration and velocity, and then you're going to go very far, very fast, or you're going to come undone in mid-air.

[0:39:19] KB: I love that. I have a similar thing where I say, these things just speed up everything. If you have sloppy practices, they're going to speed up the slop. If you're accumulating tech debt, they will speed up the pace at which you do that. If you're doing good practices, it'll speed that up. I'd actually love to dig into the language choice piece a bit, because we too are doing a lot of development in Go. There's differing opinions on the team about it. I have definitely observed that it seems to be a language in which the LLM stay on the rails much better than many other languages. I'm curious like, what have you seen in terms of the characteristics of a language beyond just being very opinionated, which Go is, but what are the characteristics that lead to good, agentic coding?

[0:40:05] CW: Let's start with the simple things. Using white space and indentation for structure is just generally a bad idea. Sorry, Python. Having something that is more structured and that helps an LLM infer structure is helpful. Obviously, there are AST-based toolings that help with that. For a simple, text-edit-based modification, we find that these more C-esque languages work very well. Also, being idiomatic, like having only two ways to shoot yourself in the foot is generally more successful than having five different ways of doing that. We come back to the consistency piece. Consistency can be enforced through the language, but it can also be enforced through a practice. A well-configured ESLint goes a long way.

In terms of languages, the other thing I say, well, we've observed many a time is, and it's no surprise here, there's a very clear correlation between public training data and the quality of the code changes. If you're writing COBOL, or FORTRAN at a bank, I'm afraid the frontier models aren't going to do that great job at editing your code. If you're writing Java, if you're writing any of the big languages, you're going to have a much, much better time.

[0:41:26] KB: Yeah, I think that makes sense. I have a hypothesis I want to bounce off you as well, which is I think the fact that Go's dependencies, the way that it imports things, the fact it doesn't have a full-featured object inheritance model, also is helpful. Because in my experience, LLMs are very linear thinkers. Being able to linearly look from here to where is the code that defined it and not have to climb up an inheritance hierarchy, or anything like that seems to be related to reliability.

[0:41:54] CW: I agree with that. I think it comes down to what tools you give the agent. One of the things that really surprised me when Cloud Code first came out was that there weren't any specialized tools. It's all grab and file read. It goes an awful long way. If you look at the generational systems that came before that that invested so heavily in rag and semantic indexing and got very involved in trying to build up a higher value model, and then called code comes along you're like, no, you don't really need all that.

The flip side of that, I think, is where you do need it is exactly for those non-linear languages and probably for code bases that are large enough, where this reasonably simplistic way of navigating the code base doesn't work anymore.

[0:42:43] KB: I do wonder if this - coming back to, we talked about how this is changing the cost curve. We talked about it changing the cost curve for different types of deployment, or businesses. I also wonder if it changes your cost curve for code bases. Does this push us more towards smaller code bases, or towards separated code bases versus mono repos? Where do you see that side of the industry playing out?

[0:43:09] CW: We already see that, and we'll see more of that, where code bases get specifically adapted to work well with agents. Things like agents MD and having rules in place. Also, I think we'll see values, like consistency become more and more important. Because not only are LLMs linear thinkers, they're also very good at generalizing within the things that they've seen. They see the same thing. They're more likely to produce something that looks like that. I haven't thought enough about where this is going to - whether it's going to push us in mono versus multi-repo?

I personally have always had a very strong preference for mono repo. Because we also, we dabbled a bit in the heydays of microservices, we dabbled a bit in that. Boy, did we take many left turns that I wish we hadn't. I do feel like, that there are a lot of large organizations that run famously run a mono-repos and do that very, very well. If you have infrastructure that can handle it, if you have a system that lets you instantiate and clone that mono-repo and work with it well, something like Ona, then this is a really good way of going about things, because you have it all in one place and you resolve an entire layer of lookup and navigation.

[0:44:25] KB: 100%. I had a problem split across two repos. I was trying to get an agent to solve it. I was like, how do I tell it? Oh, now I need tooling to say, okay, these are the repos you got to pull and you got to do it this way. Mono-repo resolves that up to some level of scale. To your point, then the agent has to be able to explore everything.

[0:44:43] CW: Or at least navigate it. It has to be able to understand where it needs to look. Every day, almost, I'm actually amazed by how well the combination of find and grab actually works.

[0:44:57] KB: Fair enough. On that subject of adapting code bases. So, you mentioned agents MD. You mentioned linting rules, consistency. What other things do you think are useful adaptations for a code base to make it agent ready?

[0:45:14] CW: Well configured tools. Coming back to the deterministic validation that agents need to do, having standardized development environments, so that everyone uses the same set of tools and hence, everyone's agent uses the same kind of validation, I think greatly helps in overall lifting code quality. If everyone uses a different version of whatever it is, node, or lint, or Java, what have you, you're going to have so much more variance. Fundamentally, it's a game of producing variance.

Generally, I think that the overall theme is driving standardization across your team, or teams. This is my own take, but I've seen several cycles of this. As an industry, we go through every five to seven years, we go through cycles on the spectrum from engineers are the kingmakers. Let them do what they want, they know best, to very rigid, centralized, we're now going to do it all way and this is my way and this is how we're going to do it and you all have to comply. We iterate, we pendulate between those.

Right now, we're pendulating towards standardization. We've had this phase of let everyone do what they need to do and all of a sudden, you end up with thousands of repos, everyone with its - that guy over there uses airline for some reason. It's like, well, we're a Java shop. Right now, we're pendulating back. I think this comes at the right time. I would argue that agents are an accelerant to that. In terms of what do you need to do to your co-base to set it up for agents is to lean into that, is to lean into the standardization and to find tools that help you drive to standardization across your organization, that help you also lift this beyond the repo.

If you're coming from this world where you have a thousand different repositories, each of which does its own thing, what you need now is a layer that helps you standardize across them. It's not config files in the repo.

[0:47:15] KB: Well, one of the through lines I'm saying through this conversation is we need to lift the level at which we're thinking about these things, right? You are not designing this line of code, this function anymore. You're designing the rules that the agents are using to modify all of this. You're not designing your one repo anymore, because each one has to be different. You're designing like, what is the system that these autonomous things are going to go off and do? There's a whole layer there of those decisions and there's a whole layer of tooling that we need to deal with that. I want to come back to, right, you're building this agentic building block. You built the IDE for cloud for the last generation of IDE. What is the development environment when you lift up one level of abstraction?

[0:48:04] CW: I think there's a version of this as it stands today and there's a version of where we're going. The version as it stands today is being able to engage with the software that you're interacting with at the right level. Being able to go down levels of abstraction as needed and also, degrees of concentration as needed, if that makes sense. An analogy here is, even today, if you're writing embedded software, maybe you're writing that in Rust, but parts are still in C and parts may still be assembly, or the Linux kernel, for that matter. Same thing.

I think the same thing is true here, where a good part of your specification now is English. I don't think English is a programming language, but you're going to use English to describe a lot of your problems. That's one level of abstraction. Then you go down into cursory glands of code, and then you go down into very deep engagement with code. This interaction that you have with your software, I think, needs to live on these different levels. That's where we're at today. That's also really the idea that Ona is built around, or at least the interaction with Ona is built around.

Where we're going, I think as we - right now, essentially, all agents focus on the production of change, the interaction with code direct, if we're honest. Then, of course, there are some who focus on adjacency, such as code review. We all, at this point, recently understand that this is a hotspot, something we need to look into. Hot take, I don't believe putting a bunch of comments on a pull request is the end all of that. We'll see. We already are seeing agents permeating more and more of the SDLC. Also, on the right of the commit, like post deploy, we see agents come in help with operational work, and we see agents come in on the PM side of things and on the planning side. I think we'll see them more and more. Obviously, all these things are that there are some large players who try and play the entire SDLC, but in reality, it's still reasonably disjoined. I think what we're going to see over time is some form of consolidation, whatever that time is. I mean, I'm not going to get too concrete here, but we will inevitably see that. With that, the level of abstraction for the entire SDLC will lift. Okay, I'm going to make this more weird. I'll say, five to 10 years.

[0:50:32] KB: You don't think programmers are going to be out of a job in the next six months?

[0:50:36] CW: Oh, no. Oh, no. Gavin's paradox is so real. We're making it cheaper to produce software, so we're going to produce more of it. Whenever we've made it cheaper to do anything, we just did more of it. The same thing is true here.

[0:50:49] KB: Yeah, absolutely. I think there's something. I'm really excited to see what that five to 10-year looks like. Because to your point, all of the layer of tooling that we're looking at right now is about producing code, pretty much. Evaluating that produced code and doing that sort of thing. We are still, as humans, having to make most of the decisions behind that code. As more code is created, there's more decisions to be made. How do we design an environment that elevates those decisions and makes it easier to make those decisions that are key and necessary, while abstracting away at the lower levels, unless you have to pull out your scalpel?

[0:51:31] CW: Absolutely. What language do you want to use to specify the problem you need to solve? There are obviously entire professions whose job it is to translate between these different languages. Translated between, here's the business problem that we need to solve, and here are the hypothesis of how we're going to solve them. Eventually, that translates into code many, many hops later. The question is, that will collectively solve is, what is the right language? What is the right form of expressing intent and expressing the problem we need to solve?

One way we see that show up already today is that the more classic ways of limiting what you can do on a machine in a particular environment find their limits. Concretely, we see all the different agents having some denial list, or allow list mechanism where you can specify in this or that syntax, what the agents are allowed to do or not. We have the same thing. You can specify, the agent is not allow to run AWS, because you don't want it to drop your production database, other than any agent would ever do that.

[0:52:38] KB: Have definitely talked to people who've dropped production databases from their agents. Gemini, I hear is particularly prone to getting rid of databases. Or was.

[0:52:47] CW: When we reach that state. You shouldn't have dropped that database. You're absolutely right. We too have these denialists. Now, they work today. That they they're okay. But they're not good enough. Fundamentally, the thing I want to avoid is it dropping my production database, not it running AWS, the AWS UI. The level of abstraction for how policies look like, especially in a world where we have actors that don't care about getting fired, is going to be a really, really interesting challenge. I think this is one of the first places where we're actually going to go and see this show up, this right level of abstraction.

[0:53:28] KB: Well, and you highlight something important there, too, when you say, actors that don't care about getting fired. Fundamentally, there still need to be humans responsible for the outcomes of this.

[0:53:37] CW: I guess. It's a bit like self-driving cars, no?

[0:53:40] KB: If I commit some code that an agent wrote, and it brings down production, who's responsible?

[0:53:46] CW: Probably you. But who says it's you committing it? If you look at the economics of it, I would argue, it's reasonably inevitable that we'll have agents who commit code and agents who review code. There's a regulatory work that tries to stem that tide, where we're mandating, so if you generate the code, humans have to review it, or if you write the code by hand, AI can review it. Fun stuff like that. Give it time, the economic reality of it all will mean that sooner or later, we'll see code that no human has ever seen. Then, who do you hold responsible?

[0:54:24] KB: That is a great question.

[0:54:26] CW: Self-driving cars might just have the answer. As a company that builds an agent, I don't necessarily want us to go their way, but the way this works for self-driving cars is the manufacture of the car. I'm sorry, Tesla. But if you're building a self-driving car and it's killing someone, or it gets into an accident, you need to prove that your tech, or there's some level of scrutiny that you're not subject to. I think we'll see similar things happening in agentic software engineers.

[0:54:57] KB: That's fascinating. Essentially, we get to a place where Ona, or whatever other agentic provider you are working with, has responsibility, liability, something along those lines for what is ending up in production somewhere else.

[0:55:13] CW: We might end up there. Personally, again, I obviously hope we don't. I think a key difference is that at least for now, example, what this metaphor falls down is automotive is and has been subject to so much more regulation than software production. We do also see a trend that, again, personally, I don't particularly like, but specifically in Europe, we see an increasing trend towards regulating the production of software and equating it to the production of physical goods with all the regulatory downsides that come with that. I would not be completely surprised if we saw it elsewhere as well.

[0:55:50] KB: Yeah. Well, and it does come back a little bit to this question of what level of abstraction are we operating at? Where is there a human involved, such that responsibility can be applied? Is the human making the decision about what the specification is, but then the agent creator's responsible for creating that with fidelity. If it meets the specification, then the agent maker is off the hook and it's the person who wrote the spec that's responsible. It's a fascinating problem.

[0:56:20] CW: Absolutely. I like what you raise here, where there's a shared responsibility model. It's not the first time we've landed on that idea. Maybe this is what we end up with. The model that we're engaging in today, where it's essentially one human that's responsible end to end, also has very clear limitations. Because maybe you're responsible for it, but how can you make good on that responsibility when agents are producing 20K lines of code?

[0:56:47] KB: Yeah. Well, and that's, I think, one of the things that people are running into in terms of adoption is we need to shift our mindset of, I'm reminded of the shift for how we manage hosting and servers and things like that. Used to have your server that you carefully curated your pet, it had a name, all these things, and we moved to servers, or commodities, their cattle. You get some compute, you get some memory, you rent a GPU from here, all those different things. They're managed in some infrastructure as code setup. You don't think about the special details of one. Are we moving to code as cattle?

[0:57:27] CW: I think we are. Yeah. I like the analogy. Code as cattle, also sounds good. There's an interesting element of that also, where going full circle, as software engineers, many of us, we've tied our identity to our ability to craft code, instead of solving problems. Maybe we come back to that. But there's a pride in the craftsmanship, for good reason, for good reason. But I wonder at what point, and maybe we're already past that point, where that is a luxury that at least for a business is no longer economically viable.

[0:58:05] KB: I mean, to switch metaphors to a different area, I don't think I own any handcrafted furniture. There are still people who make it. There are still people who buy it. But the vast majority out there is manufactured.

[0:58:19] CW: I'll provide the counterpoint, every piece of furniture in this room I built.

[0:58:24] KB: Well, and I still write software sometimes, right?

[0:58:28] CW: Exactly.

[0:58:28] KB: For me.

[0:58:30] CW: My house has plenty of IKEA furniture. There is the reality of, if I need a bed for my son, am I going to go and build this? Well, maybe. But in reality, I'm going to go to IKEA and buy it. I think the same is true for software. There's a big difference between writing software on the weekend, because you enjoy the process of writing. But as a business, are you going to equip your office by having your employees both furniture? Absolutely not. The same trend, I think we'll see in software.

[0:59:01] KB: That is a massive mental shift for software engineering.

[0:59:05] CW: It's interesting, because this same shift, I think we see in adjacent roles, too. Fundamentally, what's happening is a consolidation of skills onto ever fewer people. We're noticing that with that increase of level of abstraction, we can now skip some of this really complicated, lossy process that is communication and consolidate things in singular brains, and with that become more effective.

The traditional product manager role now consolidating into design engineer, product engineer, whatever they're called, member of technical staff, we see that consolidation happening elsewhere, and it's only natural as the level of abstraction arises. So, as engineers, something to think that we really need to re-identify with is solving problems, not the hammer with which is solved.

[0:59:59] KB: I love that. Well, we're getting close to the end of our time. Is there anything we haven't talked about today that you think would be important for us to cover?

[1:00:07] CW: I think we've covered it all. The one thing I'll add is at this point, if the models that we use today are the worst models we'll ever use. If those were all the models we'd ever have, which clearly they're not, Sonnet 4.5 just dropped. Is it a light year change over Sonnet 4? No. Is it a good change? Probably. First, vibe checks. Looks promising. It does more and does things better, and the same was true for GPT-5. We're going to see an improvement in models. Even if that stopped today, the way we write software has fundamentally changed and there is simply no going back.

I probably don't have to tell this to the audience of this podcast. As engineers, really, the best thing anyone can do right now is to embrace this reality that we find ourselves in. Fighting it is inevitable. It's like trying to fight the Internet. It's not a fad. It's not going away.

[END]