EPISODE 1908


[INTRODUCTION]


[0:00:00] ANNOUNCER: AI-assisted programming has moved far beyond autocomplete. Large language models are now capable of editing entire code bases, coordinating long-running tasks, and collaborating across multiple systems. As these capabilities mature, the core challenge in software development is shifting away from writing code and toward orchestrating work, managing context, and maintaining shared understanding across fleets of agents.

Steve Yegge is a software engineer, writer, and industry veteran whose essays have shaped how many developers think about their work. Over the past year, Steve has been exploring the frontier of agentic software development, building tools like Beads and Gas Town to experiment with multi-agent coordination, shared memory, and AI-driven software workflows.

In this episode, Steve joins Kevin Ball to discuss the evolution of AI coding from chat-based assistance to full agent orchestration, the technical and cognitive challenges of managing fleets of agents, how concepts like task graphs and Git-backed ledgers change the nature of work, and what these shifts mean for software teams, tooling, and the future of the industry.

Kevin Ball, or KBall, is the Vice President of Engineering at Mento and an independent coach for engineers and engineering leaders. He co-founded and served as CTO for two companies, founded the San Diego JavaScript meetup, and organizes the AI in Action discussion group through Latent Space. Check out the show notes to follow KBall on Twitter or LinkedIn, or visit his website, kball.llc.

[INTERVIEW]


[0:01:44] KB: Steve, welcome to the show.


[0:01:46] SY: Hey, KBall, thanks for having me on.


[0:01:48] KB: Yeah, I'm excited to go into this. I was mentioning before, I've been a fan of your

writing for a very long time, so I'm really interested to see your speaking goes. But let's have you introduce yourself to our members who may not be as familiar with your writing. How do you describe yourself, and how you got to where you are today?

[0:02:04] SY: Yeah, nobody's ever asked me that before, and you gave me exactly 12 seconds of preparation for this. So, thank you for that extensive - how do I describe myself? Yeah, an industry vet for sure. Right? I've done this a long, long, long time. I actually started programming when I was 17, and I turned 57 a couple of days ago. And so it has been 40 years now. Yay. So, I've seen a lot, right? I've seen a lot of transformations and that kind of thing.

I picked up blogging at Amazon because I was trying to figure out how to convince an organization of 800 engineers of certain things, I guess, that I thought they were thinking about wrong. And I started ranting, and it picked up in popularity. And then, I don't know, that just became a thing for me, right? The blog rants, right? The drunken blog rants. I actually quit drinking over nine years ago. I'm going for a 10-year break from drinking.

[0:02:53] KB: But you've maintained the rant.


[0:02:55] SY: But I kept the rant going, actually. Well, I'll just say weed's legal in our state. Let's leave it at that. So anyway, yeah, that's me in a nutshell.

[0:03:02] KB: All right. Well, and one of the reasons I'm excited to talk with you today is I think, in your rants, you've been one of the people laying out a lot of the bleeding edge of what's going on as a transformation in our industry right now. Let's start maybe by walking through that evolution. And I'd love to get some of the play-by-play from you as you were thinking things.

So, I'm going to go back to - I think you had a blog post that called out a pattern we were starting to see that you called CHOP, chat-oriented programming, back in 2024, I think.

[0:03:30] SY: Indeed. Mm-hmm. I remember that.


[0:03:34] KB: Was that the start of LLM stuff for you, or was there something even predating that?



[0:03:39] SY: I mean, look, when ChatGPT 3.5 came out, I mean, I was just shocked that it could write Emacs Lisp functions that were pretty good, right? I mean, just single functions, and that was about the extent of its abilities. But still, it was like, "Whoa. Elisp." That's pretty edge case out there, right?

And so, I don't know. I think chat was probably the first time where I started feeling like I was taking crazy pills, and nobody was ever listening to me because everybody uses chat today. Just the Claude Code crowd is what, 10%, 15% of developers now? 20% or less? It's growing, right? But, I mean, still, most people use chat. And it was all of 2024 before I was trying to get people to use chat, right? And they're like, "No, completion acceptance rate." You remember that metric?

[0:04:25] KB: I do. VS Code is still watching it from what I can tell.


[0:04:27] SY: Whoa. Well, I mean, yeah, there's probably still developers that are using completions. Yeah, I started kind of feeling like I could see into the future, right? Because I don't know. Look, when you've done this for 40 years, and you've chased productivity for 40 years, trying to make yourself go faster, you get a sense for when you're going faster, right?

And so, yeah, there's a lot of speed bumps, and the AI does a lot of things wrong, and so on and so forth, but it was just like - I don't know. It was like finding the early hover bike in the Zelda game, where you could like - you didn't have the battery yet, but it was still faster than walking. And everyone complains the hover bike's crashing all the time, but it's like, "Dude, it's faster than walking. You have to use it now." And it's just been getting better since then, right?

So, Claude Code put chat in a loop, right? Chat put questions in a loop, and then Claude Code put chat in a loop, and Gas Town puts Claude code in a loop. And Ralph Wiggum does also from Geoffrey Huntley. So we're just seeing basically AI - we're just multiplying it. More AI times more AI times more AI. And that is the solution to everything. And it's really making people mad, right?

[0:05:31] KB: Yes.



[0:05:31] SY: Hacker News threads and stuff like that, right?


[0:05:34] KB: Well, and I think we'll get into that. But I'm curious. Along that route, you said GPT
3.5 was the first eye-opener when suddenly the scooter is going faster than walking.


[0:05:43] SY: Yeah.


[0:05:44] KB: Have there been other turning points for you along that journey?


[0:05:46] SY: Yeah. The next tipping point was very much GPT 4.0, right? Because that one - what we had been struggling with when we were building coding agents that only had chat built in, but we would do the copy-and-paste and stuff, was that once a file got up to around 800 lines, there wasn't a lot of fidelity in reproducing it when they would make changes, right? Kind of like Nano Banana can be now when it's editing its own stuff, it just starts to blur.

And so the tipping point was 4.0 was able to reproduce a thousand-line file with perfect fidelity and make a one-line or a one-character change. That was huge because most source files in the world are a thousand lines of code or less, or should be. And so now we're talking about GPT being able to edit all of the files in the world and make simple changes, which immediately, at that point - I don't even know when that was. It was before last year, right? It was middle of 2024.

[0:06:35] KB: Yeah, mid to late 2024, I think. Yeah.


[0:06:37] SY: That's already when, "Oh, you could farm this." I'm a gamer. I mean, come on. You immediately gamify everything. And then the next, there was another one with Sonnet 3.7. It was like, "Oh, whoa." That was Claude Code. That was my biggest tweet ever. It had 300,000 views or something, where I was just like, "Ooh, Claude Code is neat."

And then Opus 4.5 was the next big one, right? This is the one that got Gas Town launched. It couldn't have launched without Opus 4.5, right? Opus 4.5 is what got it off the ground. And the half-life on anthropic models, if you've been counting, has been about four months between

models at the beginning of 2025. And now it's up to about two months between models. We're probably going to see an Opus 5 drop real soon. Right?

All the naysayers, I mean, we'll talk about it, but just you ask about these tipping points. A lot of the people who are looking at this problem right now and discounting it have approximately a three-month window before and after today. They're looking back about three months about what the last generation of model was. And they don't know very much about how Isaac Newton invented differential calculus. And if you just zoom in, they're on this really steep slope, and they don't see it. They just see the derivative.

And so what's happening is I have had the view since ChatGPT 3.5, and really for 40 years of this acceleration. And now I see it starting to accelerate really fast. And Opus 4.5 is the splash, the big boulder that has hit the pond, where people are realizing that it doesn't need to get any smarter now at all.

[0:08:06] KB: Exactly. I have in some ways a similar journey. And I would say starting from about Sonnet 3.5, we were at a place where I at least started to say, "You know what? If they'd never released another model, and we just kept learning how to use these and building the tooling around them, the world of software engineering is forever changed."

[0:08:22] SY: It would be. Now it wouldn't have made it into certain domains. 3.5 and 3.7 had their limitations. There was a certain size of mountain they could chew through. And you would have to break things into that size in order to use them. But it was doable. But now with 4.5, that mountain size has gotten much larger, and it's obviously just going to continue.

I mean, look, we're looking at a new world where - I mean, the guy who invented Redis, antirez, or whatever. Did you see his post? He had this realization just a few weeks ago. He worked with Opus 4.5 and Claude Code, and was like, "Well, it doesn't make any sense for us to write code by hand anymore. And this is a really, really - this is the big horse pill that the industry has to swallow right now, right?

[0:09:03] KB: Yes. And I think there will be some fun times grappling with the implications of that. But let's maybe look at our current moment where things are. And I would say in this

evolution, as you highlight, the sort of current threshold is how do we coordinate across fleets of agents, whether they're working in parallel or even in series, where you're trying to do a set of different things. And you've done some projects around this. So let's maybe start with Beads, which, if I blink, that was just 3 months ago.

[0:09:31] SY: Yeah.


[0:09:31] KB: Yep. Three months ago. Has definitely gotten a lot of popularity. There's also a bunch of related concepts. But let's sort of talk through Beads. What are they? What is it as a primitive for this fleets of agents world we're living in?

[0:09:45] SY: Yeah. So I don't know if you saw, but yesterday Anthropic launched an update. They retired to do, right? And they launched tasks, which they credited as being inspired by Beads, which I thought was very nice of them. And I also get why they didn't just use Beads, right? We can talk about that. But basically, Beads is a task tracker. So like a better to-do list for your agent, but it has three properties that make it really, really interesting. One of them is that it's a graph. It's a task graph, which is a lot like your work graph, and your implementation plans, and your GitCharts, and kind of how you manage knowledge work in general. It kind of accidentally captured the ability to capture all work, right? Into microcytes. Into bytes that will actually scale up as cognition scales up.

It's really interesting how Beads are just - they're just dividing work into a graph, which we've always done, except you add two more ingredients. You add SQL. I mean, graphs and databases have never been that great, but they've been working on it for 40 years, and it's pretty good now. It's just a pile of nodes and edges, right? And so you add SQL. The databases love SQL. I mean, the databases.

[0:10:50] KB: The LLMs


[0:10:50] SY: The LLMs.


[0:10:50] KB: Yeah, you can get shockingly close to Beads just being like use SQLite to track your thinking. Go.



[0:10:56] SY: You can. Right? But Beads introduces graph edges that we humans never would have probably put in or thought of. Claude helped design Beads. And because of that, it's got stuff the AIs feel is very important. And they use it all the time. Like the discovered from, which tells you who was working on what when this Bead was opened. They love that for the forensics for understanding how the work unfolded because Beads is the surface of work as it's getting executed. All the closed Beads - the integral under that surface is the work that you've done.
And all the open Beads are the remaining work. And Beads itself tracks that surface generally.


Yeah. And the third component to it that makes it completely magical is Git. So it is a Git ledger of all of your work. So you've broken it into bite-sized, addressable pieces that can refer to each other. There's a graph structure to it. It's queryable with a database, and it's all on a Git ledger, which is just shockingly useful because you never lose them ever. The history is always there. You can always reconstruct if there was a problem, right?

And it gets better than that because you can actually start looking at these ledgers and determining how well agents have done over time. And you can even see your own work on the ledger. It's like a portable resume for you. It's really wild. Beads was a really interesting kind of a discovery.

[0:12:12] KB: Yeah, I think it's worth digging into a few of those pieces. But let's maybe start with conceptually for a developer who's used to coding and managing work. We're going to stay at Beads. Not talk about Gas Town here yet. But what is the cognitive shift that you make as a developer if you start using Beads with your agent?

[0:12:31] SY: Well, if you're already using an agent, then it's already using probably to-do lists and markdown files, probably. What else would you use? Maybe you have a wiki or a database. Okay? And the problem with all of those solutions is that they don't have Git. Or maybe you're using markdowns, and you are using Git. The problem with that is that doesn't have a graph structure that's queryable in a database. And the LLM has to read and parse the markdown files to get that graph structure every single time it looks. And they get out of date, etc.

If you're using an agent and you're really leaning into it, then as soon as you try Beads, literally,

you just try it. And people reach out to me from all over the world, KBall. Man, they're like every day. And they reach out to me. I had a colleague meet with me just two days ago, and he was like, "I started using Beads. And yeah, it's a huge unlock, but I don't understand why." Right?
And it was because it's like CATNIP for the LLMs. It's like candy for them. It's memory for them.


And as soon as they get what it provides, they just want to use it. They get kind of mad if you try to take it away from them, right? Because you'll never have to lose any work again. You know how LLMs are. They're focused on the thing you gave them, and they'll notice, "Oh, by the way, your other room is on fire there, but it's not really my problem, so I'm just going to focus on this." Right? They disavow work. They say, "Someone completely unrelated to me broke the build." It was them in the previous session. Right?

That doesn't have to happen anymore with Beads because they're like, "Oh, I see this is a problem. I'll file a Bead for it." Right? And you can see how this ties into agent orchestration because as you're piling up Beads, you're piling up a work backlog that you could - I mean, depending on how well specified the Bead is. Because you can put in a spec if you want. The Bead can have all these fields, and comments, and design, and whatever, right?

So if it's really well specified work, you can give it to an agent and just have it do it and have another agent code review it and then check it in, and you're done, right? I mean, as long as it passes all the tests. Yeah. So you can see people are using Beads as a substrate for agent orchestration. It's a memory, a shared memory. One that federates through Git, which means it acts like a distributed database for your agents on AWS, or GCP, or whatever hyperscaler you're running on, right? Azure. They can all communicate with each other with Beads. And you don't need a central hosted service. It's all going through your Git repo. It's wild.

[0:14:43] KB: No, that's absolutely wild. Going through Git then. I haven't looked at the implementation of Beads. Are you essentially storing Beads in like SQLite? So it's just in the file system right there? Or do you have a proprietary interface? Or how are you actually managing it into Git so that it's a task graph, it's in SQL, but it's also Git? So is that just SQLite files?

[0:15:01] SY: Yeah. So, I did the stupidest possible thing, which was I didn't do any due diligence. I didn't realize how useful this was going to be. I wanted Git. LLM, Claude wanted

SQL. And so, we decided we were just going to cram them together in the worst possible way. Right? There's a JSON file, one line per issue, and it has merge conflicts all the time, and it gets xlerped into the database, and there's a daemon, and it gets stale. It's a two-tier architecture. It's horrible.

And it's all going away in the next version release, which will come out maybe this weekend, right? Which is had I done my due diligence, I would have realized that a Git database is what I need. I need a database, and I need Git. I need versioned data sets, right? And it turns out somebody has solved this problem. The Dolt team. You laugh. Ha-ha-ha. I knew this. Well, I didn't know. But it turned out to be an old buddy of mine from Amazon, too. Tim Sehn, who started Dol, right? And I've got friends that are working there. And I had no idea that they even had this thing. But yeah, Beads is going to switch to that, and it's just going to fix everything.

[0:16:01] KB: I mean, once again, I've seen a lot of people using SQLite. But the challenge with that in Git is also merge conflicts, right? Merge conflicts everywhere. Yeah, having a Git native database makes a ton of sense.

[0:16:10] SY: I mean, the three-way merge goes away, bd sync goes away, the daemon goes away, the whole thing. You still have the Git export, you still have the federation. Dolt federates exactly the way Beads did. It's weird. It was like I was following in their footsteps, right? But they did it right, and they put 10 years into it.

And it's embeddable in Go. So it just happens to be embeddable in Beads, right? So it's just going to be one-liner. You won't even notice. It's just going to get better. And it enables a bunch of stuff like field-level merge resolution instead of issue-level. And all kinds of new history and kind of new dimensional sort of looks at the things that we weren't able to get before. Key value stores and things. People are starting to use beads.

Beads is a data plane, man. It's a data plane. It's nuts. And one contributor put in a key value store, and I was looking at it, and I was trying to wrap my head around it. I was like, "Yeah, it makes total sense." If you have agents using this as their memory -

[0:17:05] KB: Right. They want to be able to share things and have a reliable way to look it up, and all that. Yeah. Yeah. Yeah.

[0:17:07] SY: Right? A Bead is a task. It's very heavyweight, relatively, even though it's lightweight compared to like a GitHub issue or a Jira. They're very heavyweight. A Bead is much lighter weight than that, right? But a key value is super lightweight. So I was like, "Yeah, let's do it." And Dolt supports them really super well and whatever, right?

I'm just so happy the way Bead's going. The code base is garbage right now. It's vibecoded, which means that you have to run CodeView passes on it constantly. And I got behind, and I'm like a month behind on code reviews. And so I'm sure the code is just garbage, right? It works. It passes the test. People are using it. But within a few weeks, within, I don't know, a week, I'll get it all cleaned up. We'll be on Dolt, and Beads is going to be a thing of beauty.

[0:17:47] KB: I want to now step a step back. We talked about Beads, the particular thing. And you alluded to a few different pieces in there about changes in the way that we're approaching code that I want to talk about before we get all the way into Gas Town, which I think takes this up a few orders of magnitude.

One of the things you talked about was you said, "Hey, if a Bead is well enough specified, it can just get farmed out, taken care of, etc." How do you think about specification in this agent world as we're doing it? What things are you or another human in the loop? How are you managing that? How do you think about those things?

[0:18:20] SY: Yeah, I wish that I had more time to think about it, right? This question is even more fundamental to Ralph Loops. I've been talking to Geoff Huntley a lot about this. And for Ralph Loops, you really have to specify your acceptance criteria very thoroughly, or else you run the risk of getting the wrong thing. Right?

And so the way Gas Town approaches it, the way I approach it, my workflow is basically we're only going to ever implement everything to a first approximation unless it's really important, like the Dolt stuff, right? We really, really push hard on that. But everything else is successive sort of iteration. We're just going to get it out there, fix bugs in it. You know what I mean?



[0:18:59] KB: Yeah. Just wondering about how you think about specification. And in your workflow with Beads, for example, when do things bubble up to a human versus running autonomously?

[0:19:09] SY: Yeah. So you have all these different workflows that you can support, and mine tend to be so iterative that I just rarely get time to get a lot of specification time in. But it's a really interesting question. Look, you just have to make time for it, right? Gas Town is such a powerful engine that you basically spend all of your time in one of two modes. You're minimaxing, right? Minimizing or maximizing context windows.

I had this really interesting discussion last week with some folks at Anthropic. I met some very lovely teams at Anthropic who were interested in Gas Town because, to them they see it, and they see it as exposing a lot of bugs in their model, right? Because a lot of the Gas Town workarounds are things that the workers probably ought to be doing better if they understood that they were factory workers, but that's not something they've ever been trained on, right?

I was talking to them, and they said that there's an interesting kind of split inside of Anthropic where some people love to minimize context use. It's like use the smallest task possible, decomposition, right? Throwaway, ephemeral. Just write one task at a time. Because you get the benefits of your context window, your costs expand quadratically as the token size grows. And also, the performance tanks after a very small size. And so they're all about performance and cost, which is great.

I'm going to tie back to your specification question in a moment. I promise you. Okay? And the other group is the maxima, the context maximizers. And what they do is they load up the context window heavy with just lots of rich information and instructions. You know what I mean?
Because LLMs perform really well and make really good decisions, especially to strategic decisions when they understand why they're doing something and not just what you want, right?

And they said, "So which one are you, Steve?" Right? And I was like, "Well, interesting that you say that. You've just described Gas Town's polecats and crew. The polecats are for the ephemeral work that's already well specified, and it's throwaway. And you actually want them to

be small context. You want to do one task at a time. You decompose it, right? Get them to work through it, farm through it. And it's factory farming code.

But there's a lot of work that's usually design work where you're doing the hard thinking, and you need to have conversations with the LLM. And usually you want to build up a lot of context with them, right? Not to where they're getting amnesia, but often I'll be like, "Okay, you've just hit on a really difficult corner." Anytime there's some difficult corner of the code that I'm working on, I'm like, "Okay, it's time for us to roll up our sleeves and not just band-aid it, but figure out the whole, where it fits all in." And that means I got to load them up with context.

And so I have a set of documents that I'll pull of increasing mind-blowingness and show them. And so, yeah, the crew supports that kind of workflow, and the polecats support the other. And I think it's a recognition that they both exist. And I think as engineers, we flip-flop back and forth between them. But we're getting gradually pushed over to the heavy thinking kind of work where the LLMs are just going to do all the coding because we've done all of the difficult design, which is why I mentioned in one of my last blog posts that I'm taking naps all the time. Yeah.

[0:22:01] KB: I want to get to that because I I have noticed a similar type of exhaustion in this work. But before we go there, I want to follow up on this just a little bit more. The thing you described in terms of when you're getting into heavy problem-solving mode, and you're booting up all this context, one of the ways I've been talking about this with folks is if you conceive of these LLMs is they're fundamentally - they're a little VM. They're a computer that is language- driven, and code-driven, and all these different things, and code is data.

What you're trying to do is essentially write the boot loader for the problem you're solving. How do I get exactly the right set of context and data to get the right relevant things? I'm curious, how do you, for yourself, manage, or how do you within Gas Town manage? What does that boot loader look like for different types of tasks?

[0:22:45] SY: Oh, okay. Over time, before Gas Town when I was just automating my own workflows with Beads, which I think was where a lot of people are, or even without Beads, some things that I found really helpful were - see, the thing is you got to learn what they're good at

and lean into it, right? And so one thing they're really good at is to-do lists. They love bureaucracy. They love acceptance criteria. They love checking things off. Yeah?

And so on the boot up side, there's a bunch of stuff that I want them to do specifically because they failed to do it on the shutdown side. On boot up, I want you to go and look for branches, and stashes, and unmerged work, and blah blah blah, right? Unclosed Beads, whatever. Clean up your sandbox, clean up your environment on boot up, as well as on shutdown. And so both of those instructions became things that I encoded in prompts.

And so for many people, they're at a phase in their engineering where they're managing their own private libraries of prompts that they pull out as needed, right? And Gas Town was just basically me going, "Well, what if I could just have some canned prompts that sort of came up when certain roles came online?" And then it was all predicated on this what if Claude Code could run Cloude Code, right?

But yeah, the boot up and the landing are really important. So I have this prompt called land the plane. And my colleague on Monday was telling me about this. He hasn't used Gas Town. He just uses Beads, right? But he found land the plane really useful. And I lived by land the plane for, I don't know, six weeks. Feels like six months.

[0:24:09] KB: Time is compressing these days.


[0:24:10] SY: Super compressed right now. Yeah. Land the plane is okay, because the agent will be, "Party. We are done." Right? The agent is - literally, it's giving you emojis, checklist to the moon. This project is ready to launch. I am done with this feature. Look at all the things -

[0:24:27] KB: Anthropic models in particular love to do that. GPT is a little bit more staged. But yeah.

[0:24:33] SY: Except GPT can't code. So who cares, right?


[0:24:35] KB: Well, different discussion. I have found GPT 5.2 to be incredibly effective for particular styles of code and prompting, but it goes a hell of a lot slower.



[0:24:44] SY: I heard Gemini 3 is really good at UI coding, maybe as good as Claude. So, yeah, sure.

[0:24:51] KB: And UI coding is like a thing that GPT just falls flat on its face. It's so bad. It's so bad.

[0:24:56] SY: Yeah. So, I don't know. Maybe they'll develop specialties, right? Claude loves acceptance criteria. Landing the plane takes advantage of that by - it's almost taking advantage of them like as if they were OCD and giving them some OCD thing to make them right. Because even if they're low on context, even if their context window is near exhausted, and you're like, "Let's land the plane." And they look up the instructions, they'll be like, "Yes, sir." And they'll start checking things off, and they will finish that thing, right? Even if they hit a compaction, which I view as a failure mode, right? So you try to land the plane as early as you can. But yeah, the land the plane makes them much more reliable at not forgetting stuff. It's like they have common sense, but they get distracted. They need to be reminded. Yeah.

Yeah, I mean these are all muscles that you build as a developer before you jump in the lion's den with something like Gas Town, right? You've got to be really, really good at bringing context into LLMs, watching how they deal with it, and triaging it, and dealing with it. And once you get into that cycle manually for a while, you're going to feel more confident to be able to wrangle like eight of them at a time.

[0:25:58] KB: So, let's maybe use that then as a jump over into Gas Town. And let's start from the beginning. Let's describe for anyone who has not read your rant or the many responses that has spawned out on the internet, what is Gas Town? What's in the box?

[0:26:13] SY: All right. I mean, there's a lot of different lenses that you can use to look at Gas Town, right? The simplest lens is to lump it into the category of orchestrators, which include things Devin, which has been around for a long time. There have attempts to run multiple coding agents as a team, basically, right? Or just in parallel, on parallel tasks with some tracking layer. Yeah.

And the Ralph Wiggum Loop and the Loom Loops from Geoffrey Huntley. And you've got Claude Flow, right? The fancy one with the routing. And then you've got what else? There's a few others here and there, but there aren't many. But it's in that category. It's an orchestrator, right? It lets you run multiple coding agents. And it's pretty closely tied to Claude Code right now because it's on the boundary. It's on the edge. It pushes the agents so hard that they get confused regularly. And so only Opus 4.5 is really strong enough to be able to run Gas Town reliably. And even then, it breaks a lot.

But anyway, Gas Town is predicated on a really simple idea. All right? Let me tell you how it goes. As your trust with the LLM grows - and this is my eight stages of programmer that I called out, which got a lot of attention on its own, by the way, because it was a real challenge to people, right? But I'm sure it resonated that they realized that they had moved up from stage one, and they finally felt pretty good about it, and that they were done.

And when they saw where they fit on the entire thing, they were like, "Oh, no." And their ego was a bit bruised. But by the same token, they couldn't really deny it either because they had already seen their transition of leaning more and more into the agent. What's happening is you're trusting it more. And by trust, I literally mean you can predict what it's going to do better. That's the only way you're going to trust it, right? Is being able to predict. And that just means practice. And we're talking hundreds to thousands of hours of practice to get up that ladder, right? And this is not just developers. It's developers today, but it's going to be all knowledge workers before long, right?

As your trust goes up, your patience goes down. It's very interesting, okay? Because you're like, you got your agent, and they're working on the thing, and you know they're going to get it done because they've done it five times properly before. They're going to fix another test for you. And you're just like, "You know what? I'm going to start up another agent." And that's it, man. That's the gateway drug. That's the end of it, right?

[0:28:22] KB: I hear you. I mean, right now, if I'd self-assess, I'm on the cusp between six and seven for you, right? Like I'm managing three, five agents. Sometimes it pushes up. And I have some questions around where I'm running into limitations that I will get to you. But okay, so you're moving up the stack. Your trust is increasing. Your patience is decreasing.



[0:28:41] SY: Yeah. You're running more and more agents. And now you start running into problems that are very different for an individual dev. They're like team problems. You have agents that are running, stepping on each other. You forget who's doing what. You have gates, agents waiting on each other. I mean, it starts to get kind of complicated. And at a certain size, you lose the ability to keep track of it in your head, even if you're using Beads, and it's just a zoo.

And so Gas Town was me going, "Well, what if I just put them all into like work tree hierarchies and sort of gave them names?" And, "Oh, it was Jeffrey Emanuel and his male discovery that really enable -"Beads was a huge unlock, but the other half of it was male, right?

The thing is LLMs like stuff they're trained on. And the longer it's been in their training set, the more they like it. And so mail, email, which has been around since the 70s, is like a pair of old jeans for them. They love the mail interface. And so you put them together with identities and the inboxes, and they can send each other mail, and they will.

And so very quickly, I had this town of collaborating agents using mail. And I gave them names. You're the mayor, right? You're polecats. And I said, "You're going to be named after Snow White and the Seven Dwarfs." So I went through the seven dwarfs. And one day I saw the mayor, and it was really mad. That it mailed Sneezy polecat. And it was like, "You are not the mayor. You're Sneezy polecat. So read your own inbox and do your own work." Right? And I was just like, "What is happening here?" Right? It was beautiful.

Then one day a swarm took off and fixed all my bugs. I had like 30 Beads all like lined up to knock out. And I couldn't find them, and I panicked. And then I realized that they had all been closed and fixed, right? And I was like, "Woo." Right? That's that swarm feel, right?

Gas Town kind of emerged out of the muck, out of this primordial soup of managing agents by hand. But basically, it gets into this mode where you're just like, "Here's the thing, is if you have a rule for yourself where you never watch them work, never watch them work." That's counterintuitive advice. Most people keep their eye on the agent, and they're like, "I'm going to I'm going to see if you're going to make a mistake. You made a mistake." That's what they do.

And that's what I did for, I don't know, six months, right? I try to watch them all, "Oh, a diff looked weird." Right?

As soon as you get out of that mode, and you realize that they're going to make mistakes, and you're going to find them just regular engineers, okay? So, don't sweat it. Then instead, you're only looking at the ones that are finished. And the ones that are finished are a problem because either they didn't land the plane properly, and you need to walk them through that process, right? They think they're finished, and they say they're finished, but they're not finished-finished. And so, you got to walk them through that. Or they're really finished, and now it's like, "What would you like me to do?" And I'm like, "Well, let's talk about it." Right? Whoa. Well.

And that's when you sit back and start having those wonderful design discussions with them where you're just like, "Okay, so what if we were to throw the database out and try something else?" And then you're like, "Okay, off you go. Go think about that." And you cycle to the next agent, and you go, "Okay, I'm going to give you a hard problem, too." And so that's actually how I use Gas Town now, is I spin them all up, all the crew with hard problems. And you know what? It's so wild because like 8 out of 10 get done, and 2 out of 10 get lost. And I'm like, "I know we thought about this before." Sometimes we can find the design, sometimes we lost it, and I just have to redo it from scratch. It's a little annoying.

[0:31:50] KB: So, this gets into one of my key questions, which is one that once again I'm grappling with, not even quite being at the Gas Town level, but I can imagine gets even more, which is how do you mentally keep up?

[0:32:01] SY: Yeah. I mean, for starters, it can be frustrating, and you can find yourself yelling at them. Like, "I told you for the billionth time, don't make PRs. We're the freaking maintainer. I mean, come on. It's right in your prompting." And they're like, "I made a PR." But then I realized, it's my fault, right? There's a solution for it, actually. You need a tool, pre-use hook, from Claude. And you just say don't make PRs, and that's the end of it, right? It's solved.

So, you got to realize, these things are getting better so fast that you just to mostly just take this very nice zen approach and just be like, "Look, they're getting stuff done, and that stuff is

making forward progress. And we're moving the goalposts." Right? And so, yeah, it's a different world, man.

How do you avoid getting tired? Dude, I take naps throughout the day. I'm exhausted. It's like I'm a factory manager now with a brand new team. They're all a little clueless. They're smart, but they're bumping into each other. And I don't know the business very well. And I'm running around trying to keep them all busy. It's exhausting.

[0:32:55] KB: That, I guess, gets to my question of like - and this maybe gets to another kind of related thing, is how well - because I think you write software. Writing software traditionally, if we go back even two years before all this. You're writing software, and it has a couple of roles. It is you have an executable artifact. That's maybe even the smallest one. But two, you're creating this mental model of a system that you have. You're expanding your mental model of the business problem you're trying to solve, and you're mapping between those things. Now, agents are writing code. But those two other problems still maybe exist. Do you have a mental model of the code that is being written?

[0:33:31] SY: Yeah. Have you ever worked with a really, really, really good product manager? A very technical one who used to code a lot, and now they're a product manager.

[0:33:41] KB: Once.


[0:33:42] SY: Okay. Once. Yeah, you're right. And they're like gold. Actually, the ratio is very high at places like Google, like the technical ones. But still not 100% by any means. And the ones who are really, really good, you can have a conversation with them about almost any corner of the architecture, and they'll know whether it's using a hash table, or a list, or whatever. They'll know the O(n) performance of it because it affects a list that the user sees, right? Or the length of time that an export takes or something, right?

So, they'll have conversations with you, like, "Well, the export's taking too long. What if we use the cache for that?" And da-da-da-da-da. Right? The good ones. And this is also true for an Uber tech lead. Have you ever worked with an Uber tech lead at a big company?

[0:34:22] KB: Yeah.


[0:34:23] SY: Right? They're working with tech leads. And so, they're coming up with a consolidated, synthesized view of the machine that's being built across all of these teams, right? And so, the extent that you can do that, that is an engineering leadership role. And it's a hat that could be worn by product, or by technical program managers, or by senior engineering leaders, executive leaders, whoever wants to get down there and try to understand how this thing is working, right?

And I tell you, that's the most important part. It's not actually what language it's written in, usually. It's not the syntax. It's not any of the details of the linker or any of that stuff. It's what does it do, right? What's the functional specification of this thing? And we have the ability to build software so fast now that keeping the functional spec in your head is a huge task.

[0:35:07] KB: That is the core problem I'm asking about. Yeah. I'm like, "How do you do it?"


[0:35:11] SY: I have to remember the entire surface of Beads and Gas Town, including all of the integration points that people have brought in. And I find myself often having conversations with the LLM, going, "I'm really embarrassed that I don't know this, but how does our plug-in system work, or how does whatever work?" Right?

Last I checked it, it meant my approval, but that doesn't mean that I remember how it works, right? And so I have to go back and reconstruct it. Yeah, it's this constant - it's a discipline thing. It's a hygiene thing. Just you have to do regular code reviews of code you've never seen. And you have to continue reviewing it and asking questions until, as an Uber tech lead, you're like, "Okay, I think they're in the noise now. They're in the weeds. We don't need any more code reviews for now." Right? Because you've asked them bunch of different angles, right?

Same thing goes on for the product. You've got to like understand every nuance of your product. And if you don't and you're not using it, then why do you even have that code? So, you've also got to start pruning aggressively and start retiring features that aren't pulling their weight.
Because otherwise you'll have tech debt that mountain, right?

I mean, look, man. I mean, people are like, "Yeah, they're not looking at the code. And so they don't understand it." That's a very junior mentality. That's a person who's not very seasoned or experienced. Somebody who's never led a team. Somebody who's never been responsible for a very large operate - I mean, look, I was on nuclear submarines in the US Navy. And the software projects that we make are of comparable complexity to a nuclear submarine. And they're freaking six stories tall and several football fields long. And they're very complicated, right? And this notion that people have, they're just thinking about it completely wrong.

[0:36:41] KB: I couldn't agree more in terms of - once again, the code details. That is in the noise at this point. And I think there is a world we've been talking with some people about. Over the last 15, 20 years, we moved from servers as god servers, to pets, to servers are cattle. You don't think about servers you're spinning up and down. We're doing the same thing with code, right? Code is cattle is the world we're going to here in many ways. But your system still matters. And the pace of things still matters.

I'm curious, related to this actually, I have enough trouble keeping up my mental model up to date with the work that I'm doing, but I am still leading a team. So I'm also keeping up with N people who have N agents working for them. This problem expands. Are there toolings that you're using thinking about this? How does Gas Town expose this stuff in a way that is easier to process? How do you approach it?

[0:37:39] SY: I was interviewed with, I don't know, 10 other famous-ish people in 2008. A long time ago. And one of them was Peter Norvig. Some Polish kid reached out to us all, somehow hooked us, and we all answered a bunch of questions. James Gosling, and Guido van Rossum, and all these people, right?

And Peter Norvig gave the best answer to one of the questions, just all time. It was something like what is it that differentiates great programmers? And everybody else went blah-blah-blah- blah-blah. And Peter Norvik said, "Being able to keep the entire problem in your head." That was his answer, right? And that is the problem that we're all faced with.

In fact, what's going to differentiate successful teams from not successful teams is the ones who are able to keep bigger problems in their collective head. That cross-functional communication

and coordination, those costs, LLMs can help reduce them. But the rate that you're producing just exacer - it's the Jevons paradox of project management, right?

I had a non-technical boss once, not very technical, who they were very good at using lieutenants, using counsel, senior principal engineers who they trusted to guide their decision- making and lead their organization. And they had all the other leadership pieces. And so it was an arrangement that worked. And I think that some companies like Amazon adopt this directly. And you get this dual management type setup, right? Where you've got a manager, and you've got a technical sort of adviser or a technical leader, right? I think that we're all moving into this role, where you have access to a technical adviser.

And so now the question is - I blogged about two friends of mine that yelled at their colleague because he was two hours behind them, right? Did you hear about this? Yeah, because they're running 20 agents each. And so they've realized that they're moving so fast that if their work is hidden from view for even a little while, if they're not completely transparent and pushing and public and loud about everything that they do, then they might as well be working at the bottom of a mineshaft and the world will move on without them very quickly and their stuff will be impossible to rebase because stuff just moved that fast.

And they wound up getting crossed with a colleague because he was like, "Well, I implemented the blah-blah-blah." And they're like, "Why? Why did you do that? What information was that based on?" And he's like, "It's from two hours ago." And they're like, "Two hours ago? What's wrong with you? We made six decisions since then." Right? And it's like this is a serious problem, right? I mean, it's the problem.

Beads and Gas Town have sort of taken the brakes off. Look, it's going to take another one to two model iterations, Opus 5, Opus 5.5, call it, summertime, before the orchestrators run smoothly and the work that they do really is high enough quality that the average developer can kind of use it and trust it. But we're almost there, right?

[0:40:21] KB: This is one of the things and one of the reasons I was excited to get you talking on this. Because I've interviewed a whole bunch of people doing coding tools. I'm using all of. And everyone's focused on the individual developer, right? How do we optimize the coding

productivity of an individual developer? And LLMs are flipping crazy for this. We can go so far. And using it in the team context, your bottlenecks shift completely. And now it's how do we keep our wetware up to date?

[0:40:48] SY: Yeah. And also like I was saying, that dual management thing hits everybody now. You are a manager, you are a leader because you have a team of software engineers. You got thrust into the role. And now your other skills, your soft skills, your humanities, your college education, your liberal arts are all really important now. Your people's skills matter, right?
Because you're going to have a 300,000-line thing you want to go and bring to another team because it cures cancer. And the other team's going to be like, "We can't read your code. Who are you?" And it's going to become this like negotiation to get anything done anywhere because you've produced too much. Companies are struggling with this, just merging each other's work together, right?

The better you communicate, fitting the problem in your head involves fitting the people problem in your head, too, right? You know what I mean? Like the, "Who needs what?" And I think to an extent, it's a skill that some people will have and some people won't, but it's a skill that you can learn. It's a skill you can foster, and develop, and be taught. And you can be with mentors, and they can teach you this skill. This is going to be how all knowledge work works. They call it centaurs. Yeah. Have you heard this term?

[0:41:54] KB: Mm-hmm. Yep.


[0:41:56] SY: Do you know who invented it?


[0:41:58] KB: I don't know the origin. I read online, so it must be true that it came from Bobby Fischer, who he used it to describe AI-assisted chess. Then there's the whole centaur-minotaur blog, which was really clever, right? Which is which one do you want? Do you want the AI head with the human body or do you want the human head? And everybody wants centaur.

And then there's debate over whether it's actually called a chimera because that's the term that academics were using, until Bobby Fischer took it all away. And then you hear boring, hybrid, whatever. But look, the fact is everybody is going to have a friend, a helper. Everybody's going

to have an AI, right? Something like Claude. Have you tried Claude Cowork? And you know how Cloud Code can do your freaking laundry for you, right?

[0:42:39] KB: Yep. I mean, I didn't try Cowork because I already have Claude Code and Codex working within Obsidian, right? I already have what they're shipping it for.

[0:42:48] SY: So, you get it. But if you turn on Claude Cowork, and you're like, "Oh, I get it. It's Claude Code for everybody else." Like on your desktop.

[0:42:55] KB: And to be fair, my wife at some point had a problem where I was like, "Oh, this is an easy Claude Code problem," and I couldn't boot her into it. And then we tried to do it using claude.ai, the web interface, and it was terrible. And so the use case for Claude Cowork makes perfect sense.

[0:43:11] SY: Yep. Yep. Same. Same. Fun times ahead. The naysayers, I think they're in for a really - they're going to go through the five stages of grief this year. But interestingly enough, at the end of it, you wind up hopefully with happiness. Some people will drop out of knowledge work. I think they're just not going to like this. They're not going to like this whole centaur thing. They're not going to want to work with an AI, you know?

[0:43:32] KB: So, a thing that I've been grappling with is a tremendous amount of what we're doing with AI tools right now is trying to do kind of work at the same or maybe slightly lower quality in much higher pace. And there are times when more is more, right? You got a bunch of features you got to rip out. It's fine. The thing I'm trying to invert is what are the use cases, or when are we able to use this AI assistant to do the same work but at a much higher quality.

[0:44:00] SY: I got a buddy who's one of the best engineers in the whole world. I mean, the stuff he's built is a really long list at big companies like Amazon and Google. He insists that his quality is much higher with LLMs. And it's because of the way he does it. He just chooses. Yet, with LLMs, you get what you choose, what outcomes you choose, right? And he chooses quality.

And so what he does is he reviews all of their work, and it becomes a pair programming exercise. And it's necessarily better because it's the best of both of them, right? Just because I'm optimizing for throughput doesn't necessarily mean that you can't dial up quality. And for certain launches like Dolt, I'm dialing up quality just insanely high, doing wave after wave after wave of code review and going to the Dolt team and having them review it, right? Quality is just a choice, man. That's all.

[0:44:47] KB: That's I think a really important thing to discuss here is like how do you - let's spell out. As someone who has been hacking with these things at both ends of that spectrum for probably more than most folks listening to this. When you want to start dialing up quality?
Actually, one, how do you decide this is appropriate, this is the thing where quality is important versus throughput? And two, what are the knobs and levers that you personally turn?

[0:45:13] SY: For me, for the stuff that I'm working on, I'm in a case where I am very fortunate to be able to have a very low bar. A very low bar. Nothing has to work at all, right? Beads, I can't break now because people are depending on it. But Beads is incredibly well-tested. It has significantly more test code than actual code. And I think this is just a vision of the future, where we just have everything just is tested to death. And integration tests, too. Right? Token burning tests. Just verification and validation are the two - they're the two gates. And a validation is did you build the right thing, right? Which is another problem, right? You can have something go and build something that's great, but it's the wrong thing, right? And then verification is did it build it well, you know?

Along with keeping the problem in your head. This is one of those - we've got new sort of pressure points. We've got new - you know what I mean? New bottlenecks in development as the old ones have been obliterated by AI, right? One of them is merges, one of them is shared designs, and keeping those up to date. You know what I mean? We've talked about that. The merge one is terrible, and it's why Gas Town has a refinery to try to basically have a dedicated role that knows how hard merges are and just redo them, right?

But it's funny. I showed Gas Town to Anthropic. And I don't know if I said this earlier, but they felt like it was just working around a bunch of bugs in their model. I mentioned that?

[0:46:40] KB: Mm-hmm.


[0:46:40] SY: So, what this implies is that - and people have already pointed this out, right? Is that Gas Town will flatten. I don't need as many roles because half the roles were just workarounds, saying do your job. And as soon as it knows what its job is, you only need the other roles. I think you're going to have a really simple, maybe two-tier hierarchy where - yeah, maybe a little three-tier. Just because organizationally, you want to take advantage of that. You all militaries are hierarchical for a reason. I'm sure AIs will choose to be hierarchical on large enough projects, but you only talk to the top, right? I think that's where we're headed.

And I think that it's a skill that everybody will learn this year because it's just so effective, right? But senior people are benefiting more than junior people because they know what good looks like. And we're in a stage right now where the AIs don't always know what good looks like, right?

Again, verification and validation for a senior engineer are often as simple as just looking at it. So it's a little harder if you're in a new space, a new domain, a new programming language, a new area, and you're trying to build for a customer that's not you. I try to avoid doing that. I try to avoid building things that - you know what I mean?

[0:47:53] KB: I was going to ask, have you tried building anything for normies using Gas Town?


[0:47:57] SY: I have been so busy building Gas Town that I haven't been able to spin up a rig for, say, Wyvern, my video game, and just work on it, right? And see how it deals with, for example, UI programming. How do they handle that? Gas Town has had a lot of bugs. Gas Town has had much faster adoption than I expected, even though I expected people to ignore my dire warnings and use it anyway, right?

And so I've had to stay on top of it because it's had stalled workers, runaway workers. I had to kill 320 Claude Code instances on my machine the other day. And each one of them takes like a gigabyte of memory. It can really kill your machine fast. Yeah, we're still in that stage. But yeah, this is a new world, man. It's fun times.

Look, can you build stuff for normies? I don't know. I wouldn't recommend it, right? Look, why does Gas Town exist? What am I even doing with it? If I know that this isn't the long-term shape, what is it for? And one of the main things it did truthfully, and you saw this, I think, online, is that it reframed the discussion completely.

[0:48:56] KB: Oh yeah. It reminds me of the political framing of like there's an Overton window of like where people are talking, and you just moved it.

[0:49:04] SY: I moved it way down, right? What was acceptable to talk about. And it was because, up until Gas Town, I was just a blogger who was saying AI is going to be big. People are going to be running fleets of agents. It's all going to be going to be, right. And Gas Town came out and was just so damn smug purposely. I had a whole cast of characters. It had mythology. There was a song. And Nano Banana really came through. My god, those pictures are unbelievable, right?

And also, I had discovered the whole molecular work thing. And so it had kind of a theoretical foundation that was actually working to where I could see it working. It was doing what I wanted it to do. It worked, right? And so I launched it. And instantly, a lot of arguments had nowhere to hide anymore, right? It changed the conversation overnight from "No, no, no, what you're saying is wrong," to "Bro, you're pretty aggressive." Right? That was what happened. Now they're faced with the reality that it is building itself, which nobody's come right out and said it, but that's what it's doing. It's building itself. And so it's at least good enough to build itself as a swarm.
Okay. And this is deeply, deeply unsettling for people. And some of the content that I've seen is just hilarious responding to it, right? They feel like they're getting taken over by the hive mind. This is Ender's Game playing out in front of us. It's really wild.

[0:50:29] KB: The building itself reminded me of a thing, right? It reminds me of building compilers. And one of your first goals with a compiler for a language is that it should be able to compile itself, right? You want to create a bootstrapping compiler. Gas Town, it sounds like, is a bootstrapping agent orchestration framework.

[0:50:44] SY: It is. And it's a great analogy. And when my friends knew about Gas Town, but it hadn't booted up yet, we would use the compiler analogy. I wasn't using it to build itself yet. I

hadn't booted into it. I was still using just naked Claude code to build it, right? It was so weird. You would think that you would just boot into it, man. I wish I could describe this to people. You would think that it would just kind of turn on. But it went through about six or seven layers of waking up. And it felt like I was digging a tunnel and burst out into this new universe.

And then we started setting up camp. And all of a sudden, we were building infrastructure and cities and stuff, right? And every single iteration of this, almost every day, it felt like there was a breakthrough. It was like, "We're done. Gas Town's finally going to be self-sustaining. We figured out that there's no activity feed. Beads is the feed. When they close, that's an event.
Identities don't need to be separate. An identity is a Bead. It's the data plan." All these realizations, and we were like, "Yeah, this is it." And then it would just flop. It was like the Wright brothers, right? It wouldn't move.

And then on December 28th, one day, I was like, "And then we should do blah-blah-blah." And the mayor's going, "Okay, convoy X landed. Convoy Y landed. That feature is done. That feature is done." And I'm like, "What?" Because I hadn't touched anything, right? And I realized it was working. It was doing the thing. The thing, the compiler thing. It compiled itself, right? And I was so, so excited. It was two days till New Year's, and I was like, "Okay, we got two days. Let's make this thing launch." Right? And so, yeah, that's the story of Gas Town, man. I actually got it into self-hosting mode.

[0:52:20] KB: I am curious if there's other compiler metaphors that you found useful for it. I think to me, once again, I often use the metaphor for an LLM of it being a VM of some form. And I think there's a few projects I see out there that seem to be using compilation metaphors. Oh, shoot. I'm blanking on the name of it. This one that's you express your prompt intent, and it iterates on it until it can get to the right prompt for each different type. But I'm curious, for you, are there other metaphors? It doesn't have to be compilers even, but metaphors that you have found useful to draw on for building this new approach.

[0:52:52] SY: I feel like I - seriously, this system between Beads and Gas Town, I feel like I'm building systems that I've been building my whole life, my whole career. And echoes of them coming up, right? The Wyvern properties system, which is really sophisticated. It's sort of like JavaScripts, but it even has transient properties, right?



If you drink a potion, it's going to change your health by a certain amount for a while. But when you log out, it shouldn't persist with you, say, in a game, right? So, you got this idea of your permanent properties and your transient ones, right? Well, it turns out orchestration is a lot like that, too. You got your permanent work, and then you've got all the throwaway stuff that's just it was running a patrol, and all you care is that it ran the patrol and what the outcome was, right? And property inheritance and all that built in. It's a really sophisticated pattern that I used.

There's an event system that's like my video game event system. It starts to feel like an operating system. And really, with the interfaces that people have been putting on it, it starts to feel like a game. And I saw some fanfiction on X yesterday where somebody was like, "My Gas Town went crazy. We had soldier roles, and the mayors were all fighting with each other. Who's the head honcho?" And at first I thought it was like real because I had just cleaned up 300 Claude Code instances. And it's not out of the question that this could happen. It's just that they had named roles for things. It was totally fanfiction, but it was kind of also real.

I think we're going to get to the point where this acts like an RPG, like a strategy game, like RTS. I'm serious. We're not far off, man. I mean, Age of Empires, you're building stuff. Code, you're building stuff. And you're looking at the outputs of it. And man, I mean, why not make it fun? It's already really fun. It's super addictive, right? Why not lean into it?

I think it's just going to be just absolutely remarkable. By the end of the year, people will be playing games and building software. And we're talking about - now, look, there will always be a frontier where engineers will excel because you're building really difficult software. People will be very impressed by it. Because as an engineer, you were able to pull off - or an engineering team, or a giant company, you were able to pull off something that clearly took experience and resources that the regular person couldn't bring to bear. But I still think we're going to see just a huge explosion of software coming from everybody. Yeah? Just like when YouTube made it so everyone could write - and social and phones made it so everyone could upload a video. I think everyone's going to be making software.

[0:55:08] KB: So, not to get out of the LLM world, but what do you see this doing to the industry.



[0:55:13] SY: Man, for starters, my two friends that are tripping over their third friend because they're going so fast have me very worried for what happens, right? Because I've been talking to industry and they're already - it's already weird. It's already weird. When people use Codex, or Claude Code, or Gemini, CLI inside of their workplace, they're so much more productive than their peers that it starts to look weird at performance review time. And then how do you hire, and the whole interview process is thrown into question. It's affecting everything.

Also, all your bottlenecks start to move. I've seen business teams getting incredibly surprised because engineering teams are delivering stuff for them, and they're not ready for it yet. They're in no way, shape, or form ready to roll out this big thing that they asked for, right? I'm seeing business teams building software for themselves because they're tired of waiting for engineers. And they can now, right?

And so, we're seeing, I don't know, like the rise of the Jeff Bezos two pizza team, where a bunch of experts who want to solve a problem just get together and solve it. And maybe they have one engineer on staff. I'm seeing a shift to where we have all work turning into a gig economy. All workers are gig workers. Your project probably only needs a product manager for one week out of the entire project. And it probably only needs a UX designer here and there, right? And so why not rent them, right?

I see an internal economy kind of Airbnb like - well, maybe that's a bad example. But whatever. Uber. Where you're getting a gig economy where you're renting workers. Google's had this forever. SREs have office hours, right? That kind of thing. But I see with vibe coding and everybody contributing to the same big artifact that they're building together. I'm seeing a much more mobile, flexible workforce where people are helping each other on the fly as needed.

I think the answer to your question earlier is, I think, old-fashioned planning goes out the fucking window. It's gone. Okay? I think that companies that are successful will build stuff in real time, and the thing that they're building, the software, will become the living artifact, the thing that they're building. It is the shared contract. It is the spec, right? It is the prototype in a sense.

They'll have staging environments, and you'll be able to spin up five of them and try different options, and then throw away four of them, and it'll be very fertile and productive. But we're not going to be sharing specs anymore. I think those are kind of old. I think we're sharing ideas that have actually implementations. Yeah.

And this is going to just utterly change how companies do their business, right? I mean, silos will get busted up. I think a lot of bureaucrats that are manipulating the system in their favor to try to, I don't know, keep work for themselves or keep work off of their plates or whatever, they're all going to get found out and kicked out. And I think the system's going to reward people who are really, really good at working with AIs and really, really good at working with people and kind of making stuff happen in big chaotic environments. Yeah. But that's just what I think. What do you think, KBall?

[0:58:03] SY: I mean, I'm already seeing a lot of that happening. I think one of the things I'm trying to wrap my head around are like the bottlenecks are clearly moved. How do we keep attacking those bottlenecks, or moving them, or changing them around, right? You mentioned about planning. Planning is a huge bottleneck at this point, and architectures, and decision- making. There's definitely still - we sort of joked around this Uber tech lead, right? We're all having to become Uber tech leads in our capacity to absorb information about the state of a system. That's a bottleneck because that is a skill that most engineers have not developed. I don't even know all interviewers can develop it.

[0:58:41] SY: Or want to, right? It's not some people's idea of fun. They'd rather solve a problem than verify somebody else's solution, for example. And I get it. I don't think it's for everybody. But for me, it's like a dog sticking his head out the window. Getting all them smells real fast in sequence.

[0:58:57] KB: It's a wild ride, and it's amazing, right? I am managing a team and coding more code that I'm producing than I ever did as an individual engineer.

[0:59:06] SY: Yeah, you get those feelings of the happy programmer feelings when stuff clicks together and works. You're like, "Yeah." And you're just getting them all day long. Just getting those dopamine hits. It's like certain video games have found out formulas that just maximize

dopamine, right? And they're really addictive. And people just get completely into them, and they spend hours and hours.

We're getting really close, because programming's always been kind of addictive. You got to be in flow, and everything's got to be going right, and you can't be wrestling with some stupid JavaScript library or whatever, right?

[0:59:37] KB: There are so many people I know. I mean, I had this - literally, yesterday, I'm sitting in an airport, and I've got crappy Wi-Fi and whatever, and I need to go to the bathroom and get food. And I'm like, "I can't give up. I just kicked off some agents. I want to see what they do. And they're on my laptop, and I'm not."

[0:59:50] SY: Exactly. Seriously, using coding agents is like playing blackjack. It's like a slot machine. And it's because what I was getting at earlier, when you start to trust the LLM to the point where you will let them work, then you will spin another one up and another one up. And it's literally spinning in the sense of text is scrolling by. And you'll eventually reach an equilibrium, always, where you always have one available at least.

[1:00:16] KB: There's always something to check in on and see, "Oh, this is finished. What's going next?"

[1:00:19] SY: There's always one waiting for you, right? Waiting for your input. And so, even if the tools make you better at it, you'll still be in that equilibrium quickly. And it's just like Assassin's Creed got towards the end, where you were sending spies off on missions. And there was some magic about it where you weren't doing the missions yourself. And you wouldn't think it was as fun as playing the regular game, but it was, right?

And so, yeah, it's almost like coding with agents with an orchestrator is maximizing dopamine, kind of like going to a casino or like playing one of those really addictive video games. But to me, that's really heartening. Because think about it. If all knowledge work gets broken into Beads, into right bite-sized pieces, and we find a way to match all of the knowledge work to all of the people in the world, and everybody can participate in this, everyone's going to be building software and having fun. You know what I mean? It's democratized. I don't know, man. I see

kind of nothing but upside from all of this. Everybody's really scared about it, but I actually think people are going to just - they're going to be blown away with what they can create. I'm bullish on the future.

[1:01:16] KB: That is actually like a really nice close. We could close there. But is there anything we didn't talk about that you want to talk about before we wrap?

[1:01:24] SY: I'm going to share a realization with you that I had, and I'm pretty convinced that it's pretty accurate, but it's first time I've ever shared it. So, this is for your show.

[1:01:33] KB: World-first.


[1:01:34] SY: Right? I think that I figured out the magic formula to tell if you're going to live or die in the world of AI as a software product. But I don't know if that's interesting enough for your viewers because -

[1:01:44] KB: Well, I'm interested.


[1:01:46] SY: Yeah, I think everyone's trying to figure it out, right? All the boards. Because look at companies, AI is eating software. It's eating jobs. It's eating entire categories. Stack Overflow, poof. Chegg, the homework company, poof, right? Then call center companies. IDEs are starting to worry. There's a lot of software out there that's starting to get a little worried that AI is going to eat it. And in fact, all boards should be worried because Dr. Andrej Karpathy is out there saying that AI is going to eat all software, and there will be nothing left. And he's very worried about this, right?

How do you tell if you're going to make it or not? And I think that the answer is basically thermodynamics, right? If you can find a way to save tokens, then the AI will use your thing, and you will live. Okay? Calculators, databases, storage systems, ledgers, transactional workflow systems, routers, networks, infrastructure. Anything that does a bunch of computation and math that the LLM could do in their heads.

Did you see the article? Anthropic reverse-engineered how they do multiplication, and it involves chickens, and goats, and stuff? It's basically like they guess that it's 95ish with one pattern match. And then they use a lookup table based on the digits to find out that it's 95 instead of one of the ones near it, right?

[1:03:01] KB: Yeah. Once again, the simple tool of you can write code gets a lot of that to happen.

[1:03:06] SY: Well, they do, right? They write code, and they use tools. And so they're doing a bunch of matrix multiplications. They're basically lazy. They're going to take the shortest path. And so you have a couple of hurdles. One is getting your tool or product actually in their field of view so that they even have the activation energy to know about you to use you.

There's a product called Serena that you may not have heard of. It's an OSS product that uses LSP servers and your IDE to save a bunch of tokens. If you have your LLM wired up to it, it will use that instead of Grep. And it will find its way around the codebase much more quickly because it's all pre-indexed, and it doesn't have to use Grep. Right? It's proven to save a lot of tokens.

So, it's a more energy-efficient state for the LLM to be operating in. And I will argue that LLMs will always because of laziness. Call it what you want. They have a moral imperative to use the least energy possible to solve a problem. Do they not? Right?

[1:03:54] KB: Couldn't prove it by those thinking traces.


[1:03:56] SY: You can't prove it by the thinking traces by definition. Well, you can for the small ones. You can see that they're wasteful and inefficient. And so they will choose the most efficient. Writing code is often the most efficient, but if there's a tool available that's more efficient than writing code. CPU cycles are generally speaking going to be cheaper than GPU cycles if you can solve it down in that layer.

And honestly, I think that NPU, I think neurons are even cheaper, and humans are actually quite good at certain tasks that we're going to be able to do that are just going to be cheapest to give

to us. It's the matrix. The first plot. Remember, the Wachowski siblings had - it wasn't a battery we were being used for computation. I think they actually called it accurately.

Anyway, that's my hot take, is that the way that you survive the AI apocalypse is you build something like Beads, or Dolt, or MongoDB, or Temporal, or Kubernetes, or Kafka, or Cassandra, or whatever that's infrastructure that AIs will prefer to use instead of building their own or doing it in their heads, right? If you can make it clearly obvious to them and beat the thermodynamics of them knowing about your tool, which means you have to market it to them. You got to kind of do what Notion did and literally go and work with OpenAI, and Anthropic, and Google to train the models on your tool so that they're better at it. And you can actually pay to do this. That can overcome the activation energy for the LLM to realize that it will be at a lower energy state using your tool and saving tokens for whatever task it is. Yeah? Does that make sense? Do you believe that?

[1:05:24] KB: Well, we'll see. I think it's an interesting take. I think there is also a question of it's not just activation energy, but it's also access, which is why everybody's racing to become a system of record in some form or another. Do you have data that they don't have out of the box?

[1:05:40] SY: Yeah. I mean, again, I think you can still look at that in terms of energy exp - token spend, basically. Right? It always comes down to how many tokens do they have to spend to solve your problem. But yeah, RAG. That whole problem. It's another dimension to it. Sure. Will you survive or not? Maybe mine's necessary but not sufficient.

[1:05:58] KB: Yeah.


[1:05:58] SY: Sure. Yeah, we will see. Right? I mean, look, if you believe Karpathy and the AI researchers, AI will be able to do it all. If that's really true, then in two years, there won't be a bunch of apps on your phone. There'll just be one, and it'll be Claude, and it'll be able to do everything. And it'll be super addictive and more interesting than any person that you hang out with. And so, why would you have any reason to go to a different app unless it was something that your Claude companion couldn't offer you? And that's going to be things like computer games that are really well thought out, or data stores that it just doesn't have access to, or products where there's a lot of people. And they collectively provide more entertainment in a

Claude by itself. But it's going to be a weird new world, right? Where AI is going to start getting really sticky, in my opinion. People will become dependent on it.

[1:06:41] KB: I think that's already happening.


[1:06:43] SY: I mean, look, they can't read clocks. You know what I mean? It won't be long before they can't read. Because why would you have to? I don't think that's necessarily bad. I just think the world is changing really, really fast.

[1:06:53] KB: It's funny because, through this conversation, I feel like half of what you've described is utopian and half is dystopian.

[1:06:57] SY: Look, I'm going back and rereading a bunch of Arthur C. Clarke books, right? Because he actually accurately predicted a lot of this. He called it aliens, but it was AIs. And yeah, we have a crossroads ahead of us. And it could be a dark path or a good path. I really believe that. And there are people lining up to make it the dark path, right?

Imagine a single global payment rail, "Oh, how exciting that would be. A single global pavement rail." That's digital feudalism, right? That's where they can extract transactions from everything that happens on the planet. A single work rail, a single anything rail, a single social system.
Anybody who's building towards that is building towards a surveillance state. So I'm building against it, or, actually, an escape hatch. And I really think that humanity is reaching a crossroads, and AI is a forcing function. We'll see how it goes. But there's going to be massive, massive counter reactions this year, right? The social reaction against AI is going to be like nothing you've seen.

[1:07:52] KB: All right, let's call that a wrap.


[1:07:54] SY: All right.


[END]