EPISODE 1921

[0:00:12] GV: Hello, and welcome to SED News. We are your hosts. I'm Gregor Vand. 

[0:00:18] SF: And I'm Sean Falconer. 

[0:00:20] GV: And as I think a lot of you do know already, this is a slightly different format of Software Engineering Daily, where we pick off some of the big tech headlines that you might have seen in sort of more mainstream news. We're going to dive into a bigger topic in the middle, and then we just look at some of our favorites from Hacker News towards the end, which usually ends up in sort of weird and wonderful rabbit hole type projects that developers have been working on. So, as usual, we just like to kind of catch up on what we've been doing. What has been in your sphere, Sean, since we last did SED News last month? 

[0:00:56] SF: There's been a lot in the last month. So, I moved into a new house, which has been great. But while I also moved in a couple weeks ago, I actually have traveled each week of this new adventure in this new house. So, I was in DC, and then I was in Seattle for our data streaming world tour last week. And then my wife was away this week. So we've been here almost 3 weeks, and there's never been a completely full week where everyone as a family has been in the house yet. I'm looking forward for that happening. And then Confluent, also, the company I work for was officially acquired by IBM about a week and a half ago. That's exciting news. 

[0:01:32] GV: That's all completed then. Yeah. 

[0:01:33] SF: Yeah. Everything's been completed. Lots and lots of conversations now going on with those folks over at the IBM side and figuring out how we're going to work together, which is exciting. Lots of stuff happening. And then looking ahead to April, my kids are going to be on spring break, which means that everybody's going to be on spring break for vacation. So, we have some travel coming up. And then I have some work travel to Cloud Next. And then I'm also going to India. So, lots and lots of stuff happening over the next little while. But what's going on in your world? You're in a different location than usual. 

[0:02:04] GV: Yeah. Well, I'm in Scotland. It's where I'm from originally. I try and come over here for a month or two of the year. When I do, I'm up in the far north in the highlands of Scotland. So, it's always kind of fun. That's what technology makes possible. You can still do a podcast from all the way up here where there's - I actually looked it up. The town I come to is a population of 80 people. 

[0:02:28] SF: I thought it was from a small place. 

[0:02:29] GV: Yeah. So, I didn't actually grow up here, but it's sort of where I've decided is where I come back to now when I come back to Scotland. Yeah, it's always quite a change from Singapore, but that's I think by design. But yeah. No, I mean workwise has been kind of exciting actually. Obviously, small plug here for Supabase, of course. But, yeah, we've just launched - Stripe have been doing this interesting project called, well, Stripe Projects, which is a CLI tool effectively. So you install it through their CLI. And, basically, there's a bunch of partners. It's not just Supabase. You'll find like all the usual names on there. But it's been really great. I've been kind of overseeing that one, and it got out yesterday. 

We're recording this on a Friday, and this became kind of public in preview yesterday. It's just awesome when you see something actually hit the public sphere that you've been working with engineering on for the last - we've really been crunching on this last couple months. And it's nice to see that go out. Yeah, check it out. 

Yeah, I'll be in SF later this month as well. And that's for sessions. That's Stripe Sessions. For obvious reasons, as I've now just disclosed, we're working a lot with Stripe and stuff. Yeah, if anyone is at Stripe Sessions, do look me up and say hello. Always great to meet new faces as well. 

[0:03:37] SF: Yeah, that's exciting. That'll be our first opportunity to meet in actual person. 

[0:03:42] GV: In real person. Yeah, in real life. Yeah. Yeah. Yeah. For anyone listening, Sean and I have never actually met in person. Yeah. This is going to be cool. And you're coming to Singapore at some point as well - 

[0:03:52] SF: Yeah. We have two opportunities. 

[0:03:54] GV: There we go. Yeah. Whether we'll do an in-person episode, that's a different thing. Because actually recording in person means like a whole studio setup, I think, or something. Whereas people are surprised to hear - 

[0:04:05] SF: Actually, I've done a few in-person. There's a lot more technical complexity with doing in-person. You're just recording on your computer remotely. 

[0:04:13] GV: Exactly. Yeah. I've had a couple of guests on SE Daily just say, "Oh, can we go to a studio?" I'm like, "Well, we don't have a studio." SE Daily is a remote podcast. That's how we do it. Right. 

Moving on to the headlines. So, some of you that do listen to this SED News a lot, we have talked about ARM in the past, but more so the lack of ARM. But I think the headline here is ARM is back in a big way. And this all kind of focuses around CPUs because the headline stories of any of this compute interference. The story's really been around GPUs. And we did talk a little bit about TPUs, which are Google's derivative of that. But we are really going back to CPUs. Yeah, Sean, what did you pick up on this one? 

[0:04:59] SF: I mean, it's like everything in text, the old thing is now the new thing again. It's always cyclical. I think part of it the reason we have GPUs, we have TPUs, and now CPUs are back, it's just you need a lot of compute to run all this stuff. If you're running models, you want probably GPUs, TPUs to be doing sort of the matrix multiplication there. They're really good at that. But CPUs are also good at certain things. They're good at logic, and branching, and task switching. 

And I think given now that people are running these agents on their desktop and not just running one, running multiple agents. I talked to a startup that I work with recently, and I was like, "How many instances of Claude are you running simultaneously?" And they're like, "Well, I start to lose track of things when I get up to like six." That's a lot of compute that you're running locally. 

And I think, essentially, that's what's kind of driving this is that people are running these agents locally. I don't know that that was something that we necessarily foresaw in the industry a year or two ago that so many people would be kind of running these agent experiences on their local computer versus the cloud or something like that where maybe you're using some other type of compute. But if you're going to be running these locally for things like OpenClaw or some other version of claw that now exists, or if you're doing it with the agentic engineering tools that are out there, you're just going to be consuming up a lot of CPU and a lot of memory. There's now a lot of, I don't know, appetite, I guess, in the industry to find more of this compute. 

[0:06:22] GV: Yeah. And I think one of the interesting pieces here is that ARM is actually going into more of the actually manufacturing their own chips. Not fabbing exactly, because fab, being effectively the sort of manufacturing plants that can do this, and those are incredibly - there's only three really players that have those. I believe it's TSMC. The Taiwanese is the main player. Intel do have that capability. And Samsung is actually the other huge one that has that capability. 

Not entirely clear who's fabbing these for ARM, but the difference being that ARM traditionally was supplying the designs of chips. That's their whole thing. Virtually, every chip that is created has some design piece in it that comes from ARM. And so they basically take a royalty, if you want to call it that, off almost every single chip that is created in the world. But they hadn't really gone full on into, "Hey, this is like an ARM chip with an ARM logo on it," for example. And this is kind of where they're going. 

And yeah, as you called out, Sean, it's the rise of agentic tasks. And it really has just accelerated as we touched on, I think, in the last SED News and in the last 2, 3 months that, suddenly, everyone's doing agentic stuff. Not just engineers. Anyone seems to be quite interested now in getting an OpenClaw running and leaving it on overnight on their Mac mini or something. So there's clearly just a lot of appetite now for having these independent environments that have their own CPU to run a specific agent. And I think that's kind of what we're seeing here. 

[0:08:00] SF: Yeah. And I think that's only going to going to continue. These companies, whether it's Anthropic or others, are verticalizing a lot of their success around Claude Code, Claude desktop to be geared towards finance, and geared towards lawyers, and whatever profession that you have. Every sort of white-collar job eventually is going to have people who are running these things in some sort of environment. You need compute. You probably need some sort of containers and security parameters around that. There's, I guess, a ton of people, startups, all the way to large companies that are really trying to meet the needs of the market right now. 

And it's something that, like everything now when it comes to AI, it's exponential growth. It's not this kind of like nicely growing linear thing. It's like as soon as something takes off, it's like extremely viral. And companies have to move really fast to essentially meet that demand. 

[0:08:50] GV: Yeah. I mean, NVIDIA are in this space as well. They offer CPU racks, but I guess they're still known, I guess, on the GPU side. But yeah, it's kind of, I would say, "Great to see CPUs coming back." I think the fact that they've powered so much to this point in time from a technology standpoint, and then everyone kind of thought, "Oh, they're yesterday's technology almost." But they're incredibly powerful, but for specific tasks. And you don't always need an incredibly expensive GPU running every single task, as I think a lot of people have decided just for the sake of they can afford it, and that's what they're going to do. But it isn't actually needed. 

[0:09:24] SF: Yeah. When I did an SED episode, unfortunately, I can't remember the person's name, but he was a Turing award winner in supercomputing. And one of the topics of conversation that we had was how the industry's focused so much on workloads catered towards is essentially the math behind model crunching and inference has led to essentially poor performance and supercomputers for certain types of tasks. 

There is always like a tradeoff, right? If you overly specialize in something, GPUs are really good at a certain type of math. TPUs are even more biased towards being really good at that particular math. You're also giving up something that might be important for other types of logic or compute that doesn't essentially fit the profile that you're going to run on one of those other pieces of hardware like a GPU. 

And when you're running agents, it's not like it's just model crunching. There's all kinds of other things that they're doing where you want to be able to ideally use the right compute depending on what the profile of the task is. 

[0:10:23] GV: Exactly. And it is just the volume here. There isn't really a sort of realistic world for like an average person at the moment where every agent has its own GPU. Even for most people in tech, the numbers don't add up if that's like what it takes. If you want the number of agents that you think would get your tasks done, you're probably looking at CPUs to run most of those logic tasks. 

Yeah, very cool to see ARM back in the fold. Yeah, I read a book about ARM kind of just before. I think it was just before they IPOed or something like that. And I really fully, at that point, understood. Wow, they have just, behind the scenes, powered so much of what we take for granted with all these these designs but never really got the recognition for it. Yeah, it's fun to see them actually stepping into branding their own CPUs as well. So, exciting. 

Moving on to this does sort of hit the main tech headlines in the sense. This was a lot of reporting by TechCrunch on this one. And I think especially anyone in software engineering has maybe seen it on Hacker News as well. But I do think this kind of really crosses into main news in terms of our world. The LiteLLM hack. 

Now, this was a takeover of a dependency effectively that then extracted the credentials of many services. LiteLLM basically gave nice API access to a multitude of LLMs and helped with things metering and so on so forth. It was a pretty used tool by so many developers out there. That kind of makes it unfortunately incredibly ripe as a target for someone to think about what can we do if we can get the credentials to that. 

And I think just the focus here, LiteLLM themselves were an ex Y Combinator. And then yeah, the sort of the kicker here is the fact that another Y Combinator company, Delve, they were already in a bit of hot water for potentially fabricating. They're a compliance startup. SOC 2 reports. Go through the checklists, and you get the SOC 2. It's still a funny industry, where it's actually accountants that do the final stamp on things. That's a whole other story. But they - 

[0:12:36] SF: Yeah, your auditors as well. 

[0:12:38] GV: Yeah, exactly. They're your auditors. And it still seems like kind of strange that you've got effectively a financial company saying that your security of your technology is up to scratch. But that's how that system was put together. And yeah, there's been allegations that Delve have been fabricating the sort of the rubber stamping of these reports. 

Unfortunately, Delve had given one of their reports with a clean bill of health to LiteLLM before this happened. And I think that people pains to point out, getting SOC 2 does not mean they have checked every single process that is going on in the developers workflow within LiteLLM, for example. But it should be a pretty good indicator that they're following best practice. And, well, something kind of wasn't going by best practice here, and it's had a catastrophic effect on a lot of developers. But, yeah. That's, I guess, this sort of slight security-minded focus from me. But what did you make of this one, Sean? 

[0:13:34] SF: Yeah. I mean, I think it's just another example of where compliance isn't necessarily security. Compliance is really a series of kind of check marks where you're saying, "Yes, we are following the best practices." But every company pretty much that has some sort of data breach has some sort of hack is in compliant. They pretty much all have SOC 2 or whatever, various ISOs, whatever sort of security check marks that they need. And I think in a lot of ways, compliance is really about insurance, while security is actually about trying to stop the attacks. 

There's a very big difference between, I would say, the companies that are really secure by default, and they take a hard stance on this, versus people who are, "Hey, we have to make sure that we have these badges, so that when we're going through procurement with a vendor, they're not going to immediately just say no." 

I also think the interesting thing here is where a lot of times, when we think about hacks historically or breaches, it's like what is it that people are trying to go after? Well, they might be trying to go after someone's password. They might be going after credit card information, banking information, personal information for identity theft. And here, I think there's now this like new industry that's going to be the value is, "Hey, I want your OpenAI API key, or I want your Anthropic API key." And then not only can they run up bills on you, but they could be leveraging those models to be doing something malicious against a bunch of other people, too. So, they could spin these things up using your precious tokens. Run up a big bill for you while also using that thing to attack somebody else. 

[0:15:05] GV: Yeah. So, this was discovered by a guy called Callum McMahon of FutureSearch, which is a company that offers AI agents apparently for web research. But he documented how this all unfolded. Basically, ended up with his machine being shut down. That's a pretty clear indicator that something has really gone wrong here in terms of how deep this sort of hack has gone. 

It's something we've covered on SE Daily with guests obviously of companies that help to mitigate this. One that comes to mind is Wiz. We had Rami McCarthy from Wiz. And we actually covered a case that he'd worked on that is exactly this. It's like a dependency has a sort of leak within the maintenance of it, and that just cascades through so many other packages. And this one particularly, just the number of developers right now using LiteLLM is just shown to be such a problem. 

[0:15:58] SF: I think these kind of dependency supply chain attacks are hard to catch. And it's easy thing for attackers to kind of exploit. And we're going to talk more about this during sort of the main topic, but in a world where we're using AI more and more to generate code and maybe not always looking at that code that closely, and is picking up packages in order to do some type of work on our behalf, we might not have as clear a handle on what those dependencies are. We're kind of just looking initially at the outcomes of that. Does it pass the tests? Does it kind of look correct if it's generating some sort of the UI based on what it is that we're trying to accomplish? We commit that thing, and then we get it running in production. And then, boom, we have some sort of problem. 

[0:16:43] GV: Yeah. The next headline here. I guess, again, this might have crossed more slightly in terms of the Hacker News from a headline perspective. But I think it was interesting that, yeah, OpenCode really made a lot of noise in the community here. And this really goes to the fact that agentic coding has just taken off. But most of the time, we were talking about either Claude Code, or we were talking about Codex from, effectively, ChatGPT. And that means you're buying into, again, one of the big names. And here was a product that's able to say, "Hey, we're actually fully open source." You've got access to local models. You've got access to free models. Or if you really want, you can hook it back up to Opus or something like that. But I think that's just the amount of opinions that were then given about it is kind of really interesting. 

[0:17:32] SF: Yeah, I think we're going to, of course, see more and more of these open source alternatives. In a lot of ways, it reminds me of if you looked at the IDE market maybe 15, 20 years ago, it was very similar to where open source IDEs started cropping up as alternatives to the mainstream ones. 

And the thing that I've always thought about, or since people start even with GitHub Copilot, was I think we've always gotten to a place where developers don't really paying for their tools. And then you end up with open source alternatives. And then with the IDE market in particular, the IDE very much became like a commodity. And there are certain IDEs that survive that, and people pay for it. 

But, predominantly, I would say most became open source or were given away. Even things like Visual Studio, that historically had been a big money maker for Microsoft. Today, people are able to monetize these agentic engineering tools because they are incredibly valuable. If someone took away my Claude Code, I'd be very upset about that. There is a lot of value there. 

But as you have more and more of these open source equivalents that are maybe even if they're not as good, they might be as good. But even if they're not as good, maybe they're 80% as good, people are happy to use something that's 80% as good if they don't have to pay X-dollars a month to use some other tool. Then you get into a world where people are willing to pay for the tokens, but people aren't willing to essentially pay for subscriptions at scale to have those tools. If you want to monetize that market, how do you create a big enough value and a big enough moat around what it is that you're providing so that people don't just go to the open source equivalent? 

[0:19:07] GV: Yeah. And it was kind of interesting just to look at some of the conversation around it in Hacker News as well. And I guess what maybe even I hadn't sort of fully appreciated is the fact that Claude Code, okay, it runs in your terminal, but it is actually an electron app. I didn't even realize this. And that's how you kind of achieve all these amazing visual niceties, effectively. Yes, it's in a terminal, but there's a ton of stuff going on from a rendering perspective. Terminals are, yeah, being hacked a bit in this way. I love the visuals, but then I was kind of starting to think how is this even possible inside a terminal. Codex apparently is done in Rust. And people say that that is a tangible speed difference. 

[0:19:50] SF: One of their apps. 

[0:19:51] GV: Yeah. And I think one of the things that has been questioned about OpenCode is the fact that basically you're using up a gig of RAM basically for a terminal, and that it's a pretty chunky TypeScript codebase that's running this thing. This is it when you're designing these kind of tools for developers. Well, guess what? You're going to have some pretty, I think, spicy opinions on how did you put this tool together in the first place? I think that's going to be an interesting adoption data point that they have to think about. 

[0:20:21] SF: There's a lot of these tools now available. You have Claude, you have the mainstream sort of commercial tools from OpenAI, from Anthropic, and from others. You have the open source equivalents. You have Cline, you have OpenCode. There's a whole plethora of these. How many can the industry really support? 

I feel sometimes one of these tools crops up, kind of catches fire for a period of time, gets a lot of GitHub stars and love, and people start using it, and then it kind of like tails off as the new hotness hits the market as well. I don't know that that's sustainable in the long run. At some point, I think some of this stuff starts to level out, and they'll emerge like a couple of main players that dominate the market. And maybe there's a couple alternatives. But I don't think you can have like a hundred alternatives. 

[0:21:03] GV: Yeah. And just to round out the headlines, this sort of, I guess, broke cover maybe a couple weeks ago. But I think it just sort of is useful to touch on a slightly wider story that we're seeing play out, which is how OpenAI and Anthropic are kind of - how are they playing their markets now? And there was obviously a lot of noise around how the Pentagon, they wanted to - the Department of War, or Justice, or whatever they're called these days, wanted Anthropic to allow Claude for what they call all lawful purposes. And that does include things like surveillance and autonomous weapons. And Anthropic refused, and they didn't get the contract. And then, apparently, OpenAI "swooped in and signed a deal." And there was a lot of, I think, user backlash around this. And it did make Anthropic look like the real human winner there of you want to use one of these technologies, why would you use the one that really wanted to get in with the government, the US government. 

And if we kind of then just like play that through to like, well, where have we seen OpenAI versus sort of Anthropic. Where are they positioning themselves? It does to me at least look like OpenAI started to becoming this kind of super app type construct, which we've touched on before, Sean, in sort of other deeper topics. And Anthropic, Claude, they're really going full enterprise and in a sort of, say, a positive way. But they're really going and saying we can be the enterprise tools that you need. There's a whole kind of them. 

If you loop this back to the whole Pentagon thing, well, a lot of companies really don't want to be associated with a company that they think is aiding the government with, say, surveillance. Or, as we speak, the Iran conflict is on right now. And we don't know how the technology is being used there. Just kind of staying away from that, I think, is what companies are looking at. And it's just a very interesting development, I think, in sort of the landscape of the two companies. 

[0:22:59] SF: Yeah, absolutely. I mean, I think that it ends up kind of highlighting, I think, something where those two companies are already somewhat have opposing views of what is the right thing to focus on when you're building. You're sort of responsible for building these models. And even Dario and some of the other founders at Anthropic were at OpenAI at one point, and they kind of split off because - at least the story goes that they didn't agree essentially with the philosophy that OpenAI was taking towards building models or releasing models. They wanted to make sure that, as a company, that before they released the model, it was really well tested. And really security and trust was sort of the number one value. So they splintered off, built Anthropic. 

And I think this whole thing that's happening politically has been probably some of the best marketing has ever happened to anthropic. You saw people leaving or canceling their OpenAI subscriptions. I think even under the best of circumstances with politics, people, companies, individuals might not be that comfortable with a company supporting something like surveillance of individuals, or automated warfare, or anything like that. 

But on top of that, we're also in a place in the United States where, politically, things are very polarizing right now. And there's a large chunk of the country that doesn't want to be associated with the government at all. This is a very, very polarizing issue. And it's kind of Anthropic's kind of chosen to take one stance on this. OpenAI has taken a different one. And as a result, I think it just highlights the differences between the two companies even more. 

[0:24:32] GV: Yeah. And it's not like we've seen things like this probably appear in the past where a company potentially anchors itself a bit closer to a government or not. I'm not going to bring up any specific examples. But what I will say is in the past, okay, would one have refused to use - I'm just taking pure examples that might not be true here. But would I have refused to use Google if I knew that they were aligned with a certain government? I'm not sure. Because, okay, it's a lot of your search history and that kind of thing. But now that we're talking about what LLM are used for, well, quite often, people are divulging a lot of their personal information beyond just, "Oh, this is what I purchase, or this is where I'm located right now," which it's personal information, of course. But people using LLM, they're kind of writing about their deepest, darkest thoughts. And in companies especially, people are just literally hooking up, whether through official integrations or full-on just pasting in Slack conversations and documents. And that's it. Where does that information end up then ultimately? And I think that's why this is such a concern for companies especially. 

[0:25:36] SF: Yeah, absolutely. There's something, I think, more personal about chat than necessarily search. I think people have kind of taught themselves that, "Okay. Well, maybe I shouldn't put my social security number into Google search." I don't know that people think of that necessarily with chat as much. Or they paste documents in that are incredibly sensitive, not really thinking that through because of the value. There's just so much value with using these tools that people kind of throw out maybe their reasoning skills sometimes with like, "Should I actually be sharing this? What is the consequences of sharing this information?" 

[0:26:11] GV: Yeah, I've definitely noticed that people that previously would have said, "I will never divulge certain information to a tech company." And then next day, "Oh, I've had this amazing conversation with Claude and told them all these things." And I don't comment on it. I just think that's really - I'm just observing. It's very interesting that this is someone, for example, that I know would not have done this in the past, but actually they've felt some trust there. That's a whole topic for a different day, but I think it just sort of underscores who you give your information to, especially between the - you can call it like the two main players here, Anthropic and OpenAI. As soon as one then anchors himself to a entity that people disagree with, that adds a whole extra layer. We'll watch how this plays out, I guess. 

And yeah, we'll sort of see does this affect OpenAI enough that they have to walk anything back, or are they kind of going more like Palantir style? They actually think there's too much value to be had by being very close to the government and helping them achieve things. I guess we'll see how that happens. 

[0:27:10] SF: Yeah. And I think from like a consumer standpoint, a lot of times when there's so much value behind whatever the product is, consumers end up turning off a blind eye to maybe things that they wouldn't normally agree with. You can take any retailer, major retailer in the world that's probably manufacturing goods in developing countries where maybe the standards of work are not necessarily great. People kind of sort of know that, but they also like their inexpensive t-shirts or whatever. And they're kind of willing to turn that part of their brain off because they're focused on their own personal outcomes. And I think you could potentially see something similar even if you don't, in conversation, agree with what OpenAI or any other company's doing. There's so much value behind their tools. You kind of turn it off when it comes to your own personal objectives. 

[0:27:59] GV: Yeah. So, it's that time within SED News. We're going to move to the sort of main topic deep dive. And today we are looking at effectively writing code versus shipping code, and especially the emphasis on shipping code. How does this look in the age, if you want to call it that, of LLMs now that probably at least 50% of developers in a given company are going to be using LLMs to write code? What are the kind of maybe cascading effects on how that code actually then ends up, to be frank, on main branch? What does main branch look like from a speed perspective and actually getting things merged in? 

And we kind of anchored this around there was a CircleCI as a company that many people know in the CI/CD space. And they did a 2026 state of software delivery report. That was just back in February. We're recording this just at the end of March. They said that they analyzed - this is going to be quite data heavy this main topic. So get ready for some numbers. Because thinking about when we talk about writing code versus shipping code, it's basically all numbers, like lines of code. And how many mergers were made? And so on so forth. 

They said that they analyzed 28 million CI/CD workflows across 22,000 orgs in 149 countries. So that's a pretty good spread. The interesting anchor points here is that the daily workflow throughput was up 59%. But it's kind of misleading that the top 5% of teams nearly doubled the throughput, whilst the median team increased only 4%. That basically suggests that super high performing teams could super accelerate their throughput. I.e. things that got merged into main. Again, just to kind of bring it back there. They could double their throughput the top 5% of teams. 

But the median, just call it the average - I know average and median is different. But when people think about it, that to only increase 4% on throughput for most, just call it the majority, that's pretty interesting. Because most developers are saying, "Oh, I'm so productive. I can do so much more per day now if I use an LLM." But, actually, from a team perspective and a shipping perspective, the majority are actually only increasing by 4%. That looks way off what we might have expected, I think, from this. Yeah. 

[0:30:23] SF: You know, there could be a couple things going on. One is that not everyone has reached the level of the most high performing teams. Adopting some of these tools is still relatively new to many parts of the industry. You're going to have people who are kind of trailing behind. They're kind of, as a company, maybe just dabbling. Or this is like fairly new efforts. So, they're kind of getting used to how do we get value out of this? 

But I think the other thing, too, you could say, is, well, maybe the teams that are shipping these top performing, maybe they're shipping a lot. But are they shipping also a lot of bugs, and they're just okay with that. Whereas other people are trying to control that. There's a bunch of stuff that it would be nice to kind of be able to dig into the details of why is it that some people are at 4% increased throughput, whereas you have some people who double this. And there could be many, many different reasons for that to be the case. 

The other thing too is that you know just because you speed up writing of code doesn't mean you speed up the entire software development life cycle. You have this software development life cycle of just the very simplistic one of discovery sort of requirements, coding, testing, putting it into production and then running that thing in a loop. 

Well, if you speed up that middle part of writing the code, it doesn't necessarily mean all the other parts of it has sped up as well. Those can become choke points in the company. And on top of that, if you speed up the lines of code, those choke points might become even slower because the thing - let's say, it's your security team that has to review these things. The fact that they were somewhat slow in the past was maybe okay because the part of writing code was really, really slow in comparison. But now they suddenly have a thousand times more things to do. What happens as a company? Either you cut corners and you start releasing products that don't necessarily go through that full security review, which leads to negative consequences. Or you slow down the entire release cycle so that you only get 4% increased throughput because the poor security team is completely slammed and overwhelmed. 

I mean, it's the same thing even with reviewing PRs. If we're generating more code, that means there's more PRs to review, which means that that has to be paid for from somewhere. Either you have your engineers spend all their times essentially reviewing PRs, which slows down, of course, things getting to production. Or you leverage LLM to also review the PRs, which then also potentially has certain consequences, where less and less people within the company really sort of deeply understand what is happening in the codebase. What the dependencies are? And then you end up with some sort of issue on production. And then the people who are debugging it are also not that familiar with it because AI wrote it. There's all these kind of consequences of what's happening in the industry right now. 

Whether it's like, "Hey, we're not really seeing the full throughput value of the LLM because we're generating code a thousand times faster, but these other things are getting completely bombarded and slowing things down." Or we're generating code a thousand times faster and we're pushing it out of production, and then we're also generating bugs a thousand times faster. 

[0:33:30] GV: Yeah. The kind of stat that does really make this, you can really visualize it, is that feature branch. This report that came from CircleCI. They saw that feature branches. The throughput was up 15%. Okay, great. 15% generally more feature branches are being created. But the main branch throughput itself was down 7%. Actually, that's throughput. And the actual creation of branches, feature branches were up 50%. Whilst the main branch, again, only sitting only at 1%. It basically just means that a ton of code is being generated, and a very, very, very small fraction of that code is ever making it onto main branch. 

And I think to kind of go exactly to what you were just saying there, Sean, that middle piece. The human in the loop there is still obviously the biggest question mark. You could call it a bottleneck. But I think most companies are saying it's still the most critical thing that they believe needs to be there. It's all very well generating a ton more code, or a lot faster at least. And having all these supposed features, or so on, all queued up and waiting to be potentially merged in. 

But the actual review piece. Well, that's where anyone really is going to be saying, "Well, hey, we're not comfortable with these just being whisked through by like an LLM that sort of knows how we do things." And I think that's the piece that's going to take like a lot longer to really kind of make it through to say production, production. Apparently, there's an example from the new stack back in March. We're still in March. I mean, coming out in April, but we are - 

[0:35:15] SF: Time is a flat circle, Gregor. 

[0:35:16] GV: Yeah. Well, it is an AI terms, yes. Or AI age. But VS Code, they've actually moved to weekly releases after 10 years of monthly. And they say AI is actually making this possible. But they say that only works because they say - I mean, obviously, I think VS Code specifically probably have reasons that they at least want to push the idea that this is all possible. But they say it only works because they have strong automated checks, and a team that's invested in this review pipeline. Yeah, I think it's just like what does it actually look like to try and get your team around this paradigm, I think, is probably what a lot of people are wondering about. 

[0:35:58] SF: Yeah. I mean, I think that the reality at least at the moment is, organizationally, you have to probably change. Because if, suddenly, the resources that are scarce is not necessarily the production of code, but the resources that scarce are essentially verifying the code is correct. 

Addy Osmani has written about this and said generation is not the bottleneck anymore. Verification is. If it's all about PR reviews, or in the world of infrastructure, the core piece that a lot of times really matters is networking security. How you do authentication? And if you mess that up, basically you might destroy the core infrastructure of thousands of different customers. You can't mess that piece up, or it's going to be dramatically negative consequences to your business. 

That's where you really need to take care, and maybe you have to deploy more resources there. And you could kind of cut back resources in other places where, suddenly, you don't need five people to maybe generate as much code. But you do need five people to verify the code is correct. It's almost like we're shifting the problem, where the problem historically has been it takes a long time to write the code. That's been the sort of hard, long cycle. 

And then now, I don't know that we've really sped up everything. Like I was saying, we've kind of shifted where the problem occurs essentially. I think the really negative part of this is that, in the sort of media, we focus so much on these headlines of such and such a company replaced a team of engineers with one engineer and six OpenClaws, or whatever it is. Or people are going from idea to app with one prompt and built an application in 20 minutes. 

And I think the thing that we're confusing is that when the demo comes together in an afternoon, I think it ends up recalibrating what the stakeholders think possible. Executives who watch some AI-generated prototype naturally ask, "Hey, if we built this in a day, why does production take 6 months?" And it's this whole like 90/10 rule that has always existed in software, where first 90% of the work takes like 10% of the time. And the last 10% takes the other 90% of the time. Because there's all this other stuff that has to happen around productionization. 

And we're really confusing prompt to prototype with prompt to production. And I think that is going to have massive consequences of the industry. And I know engineering teams are on under tremendous pressure right now to be shipping at the speed of essentially prototyping. And you either end up not being able to do that, which could be negative for your career. Or you end up in a situation where you bypass the normal checks and balances. And then that's going to, I think, lead to more and more of these outages and failures. 

I mean, even if you just statistically assume that with human generated lines of code to the number of errors that you have, if that statistically stays the same for AI-generated code, then if you're generating more code, you're going to have more errors and you're going to have more production outages. I don't think it's a particularly bold statement to say that we can expect more outages from companies over the next year easily because of the fact that if companies are in the place where they are actually able to put this into production faster and there is all this pressure to do that, we're going to end up just generating more code that generates more bugs at some point. 

[0:39:22] GV: Yeah. And as you touched on this, it's going to be a complete - it's a sort of a culture competition at this point. Because the company that already wasn't putting a ton of thought into the pre-writing code and then the post-writing code piece, well, they never saw that anyway. Well, awesome. We can just accelerate the bit that we thought about most, which was just generate code and generate stuff and ship it. 

Yeah, I think they're starting to get sort of a bad surprise in many areas when things don't suddenly accelerate 10x because the problems have accelerated 10x. And it's the companies that already have these literally from writing, whatever people call them, like PRDs, or RFC's, before something gets shipped. Again, augment that process with LLMs. Well, actually that's very interesting because a ton more thought and discussion can happen around something before it's even thought to be getting code written against it. 

And then the code being written against it potentially gets sort of accelerated. But I can't say that I think the best features are getting accelerated by LLMs. Actually, I would almost argue, it is maybe that first piece, where the LLM has done a ton more thinking behind the why's and the hows that everyone can then read over and comment on. And then a classic pizza team works on the feature itself. And that's still exactly what should be happening. Not to make me sound anti-LLM on code, because I'm very much pro. It's just interesting to see where does bringing LLM into the loop actually affect the final outcome and the quality of the outcome, including security, and robustness, and all these kind of things. 

[0:41:07] SF: Yeah, I agree. I mean, I think that I certainly don't want to come out sounding like I'm anti-AI-generated code. That's absolutely not the case. I think these tools are tremendously valuable. And as I said earlier, I'd be very upset if Claude Code went away. 

[0:41:20] GV: Likewise. Likewise. 

[0:41:22] SF: Very dependent on these things at this point. But I think that the problem here with sort of sensationalizing the fact that you can go from prompt to building a compelling demo, and then sort of confusing what that really means, hides some of the real value here. 

I think one of the things that this gives us is very interesting. And you kind of touched on it there, where, in many ways, software has become a thinking tool because the cost of essentially generating that idea is so low now. So, as a product manager like myself, I could spin up a working demo of an idea instead of writing a one-pager and then hoping my stakeholders can kind of visualize it. The demo isn't the product, but is a better way to have a conversation essentially about the product that we want to build. 

And in many ways, I think we're entering this era of where AI is kind of making software ephemeral. And that's a real shift in how we can do work because we can - anybody, not just people who are technically adapt. Suddenly, you have people who, with a little bit of training, can have the ability to convey their ideas through software. And I think that's very, very interesting, because if software can only be created by a small set of people in the world, you're of course going to have certain biases that come with that or certain world view. 

If, suddenly, you open that up where it doesn't mean that you're building production software, but if anybody can kind of prototype something, then it's a way, I think, for a completely generation or essentially a completely different set of people to be able to convey their ideas and software. And I think that could lead to some really, really interesting outcomes that we haven't seen before. And that's always the case when you kind of bring new world views, new ideas, new cultures to something that they didn't have it before. 

I'm excited about that, and I think that there's tremendous value here. I just think that a lot of the real value gets kind of lost in the conversation. And by overemphasizing the idea that prompt leads to faster productionization, it's going to have negative consequences that companies don't really necessarily foresee right now. And it's hurting engineering in general by sort of misleading where the real value is right now. 

[0:43:36] GV: Yeah. Kind of, I guess, the sort of TLDR here is just effectively that your sort of validation layer. If you think that LLM coding, the point of it, basically is to increase throughput generally. And by throughput here, we would define that at this point, is as merging domain in a state that meets the bar of quality that you as a company think is acceptable. Then the validation layer effectively needs to keep pace with that. 

And at least, yeah, I mean where I sort of sit. I'm not talking specifically about where I work, but you can look at a bunch of code bases, open source, etc. It does not look so far like that is quite keeping pace if that's your kind of metric. Because, rightfully, probably at this point, the human in the loop is still very much there. And still, a lot of post-discussion happens, and a lot of the volume of code needs to then be reviewed by a human. And that's a very labor-intensive task. Because I get the impression that a lot of people aren't then comfortable saying, "Well, I'll just use an LLM to review the code that was generated by an LLM." Okay, sure. I know that people use models to rank outputs of models, but this is precisely the piece. I don't think anyone's going to still have a job at the end of the day if they told their engineering manager, "Well, the code that was written by an LLM. All I did was have it reviewed by another LLM, and then I pressed merge." 

In professional software production, for big companies, for example, that we work at, that just is not going to have you a job at the end of the day. If something goes wrong, it will just be you on the end of the line saying, "Well, that was not the way to achieve that." 

[0:45:12] SF: Yeah. I mean, we both worked for companies, where if we mess up something in our core infrastructure, that means we mess up the core infrastructure of potentially thousands of customers. 

[0:45:23] GV: Oh, yeah. Six figures of people could get messed up easy by one problem. Yeah. 

[0:45:28] SF: Exactly. There's massively negative consequences to something like this. I think like part of it also is companies kind of have to think about where - if you want to accelerate things, are there certain parts of the stack that you could accelerate? Are there places where a bug is more forgiving than other places? Maybe it's not networking. But could it be part of the UI layer, where you can move faster? 

Because if you end up messing up the UI in some way, you can fix it immediately, push out a change. It's maybe less catastrophic depending on what the company provides. I think you kind of have to think of this as like layers of an onion. Which layers of that onion do you need to be really, really scrutinized and make sure are up to whatever your par is? And then are there other places where you can be a little looser on that and the consequences are not going to be as catastrophic? 

[0:46:22] GV: I think that's a great place to leave it. Yeah, hopefully some interesting thoughts there on writing code versus actually shipping code. You probably got a few anecdotes. Listeners are thinking about how you work in your own companies now. And just what does that look like in your own flows? And we're always happy to hear about those as well. 

Moving on to our final piece of the show, usually our kind of favorite piece of the show. This is Hacker News, but nothing too serious. Just interesting things that have popped up on Hacker News. I think I could scan to see what Sean's is. And I want him to go first. I definitely saw this as well, and I was like, "I guess we're going to cover this." Because, spoiler, it's to do with Doom. So you know it's going to be Doom on something. What is it this month? 

[0:47:09] SF: Yeah, it's so funny to me. Probably every week, and certainly every month, there's always going to be a Doom on something project on Hacker News that catches our eye. But it's also funny to me how Doom has just become that thing that everybody wants to jam into anything. We're going to run Doom on the watch. We're going to put it in your Wi-Fi-enabled earbud. And now this is Doom over DNS. Because why not? Really, really fun project. 

I think Venn1 was the person who posted it. There's a GitHub. You can take a look at. But, essentially, the project compresses the entirety of the shareware Doom, splits it into nearly 2,000 DNS text records and across a single CloudFlare zone. And then plays it back at runtime using nothing but a PowerShell script and public DNS queries. It's just like amazingly absurd, but incredibly awesome. 

[0:48:01] GV: I do think, despite the Doom in Typescript types, which was an incredible feat. And we did have an episode on that with one of our other hosts. I got to say this to me is creatively just one step above, because I don't think anyone thought that writing software via DNS was a thing or could be a thing. 

[0:48:22] SF: No. 

[0:48:22] GV: And as usual, Doom - I guess why is it always Doom? I mean, it's just a kind of meme, developer meme at this point. But the fact that it's this very tangible outcome that as soon as someone's like, "Well, it's Doom." It's not just saying, "Oh, I created a calculator app from DNS." It's Doom, a game. That's insane. Yeah, that's very - 

[0:48:44] SF: Yeah. I'm sure the creators of Doom back in the day had no idea. They had no idea that they were going to become like the - it's almost like some sort of benchmark standard at this point, where it's like can you get Doom to run on x-device, or whatever it is. 

[0:48:58] GV: Yeah, the two Johns. I read that book, Masters of Doom. That's a great book, if you want to ever hear about how Doom actually came to be. But nothing to do with all the funny ports we've been talking about. But yeah, on my side, a couple of sort of just interesting ones. One was why - this is called why so many control rooms were seafoam green. 

I often do bring up some of these maybe more designy ones that hit Hacker News. Because it seems like Hacker News has this amazing community of people that can sort of dig into why things were designed a certain way. Whether it's pure colors, or so on, so forth. This was posted by user Amorymeltzer. Thank you for that. 

And the TLDR is seafoam green is apparently - when we talk about seafoam green, we're talking like a very light green color, mossy light, very light moss. In Scotland right now, I see it everywhere. But think of a very light, pale green. And it's on walls in these sort of bunkers of control rooms. And why is that the case? 

Well, the TLDDR is there was a chat by the name of Faber Birren, who was somebody at the Art Institute of the University of Chicago. And when he was in New York, he actually became like a color consultant. And he helped people understand, well, if you paint your walls this color in a certain restaurant, or a butcher, for example, that might drive more sales of steak. Because the meat looks redder against a certain color, for example. 

He was then hired by the government during World War II to create pallets for different buildings, effectively, to help people just understand like, "Oh, this area is a danger zone." Fo fire red was used sort of for all emergency stop buttons, which I think we would think of as pretty obvious these days. But that was not obvious before some thought was put into it. 

Anyway, light green, that was simply used to reduce visual fatigue. Apparently, it's seen as a color that you can sit in a bunker with all these switches and dials all day. And that is apparently the color that reduces fatigue on the eyes. So, given that you've got to be staring at - back then is certainly staring at all these other machines. So, very just interesting, funny bit of trivia there. 

[0:51:11] SF: I've heard sometimes certain fast food chains choose a particular color in order to encourage people to not stick around because they want to turn over the seats. I don't know how much truth there is to it, but there's definitely something where certain colors, I think, do affect people in different ways. 

[0:51:27] GV: The one that I never understood, and it's probably why I have not flown this airline for maybe - I can't even remember. Since I was a teenager, literally. Ryanair. I'm sure some people know about Ryanair. It's a very low-cost airline in Europe. But they used to put a very bright yellow plastic thing on the top of their seats. And it was just headache-inducing. And the problem is you can't leave the plane. You're on the plane, you can't get off for the next 2 hours or whatever. 

So, I have no idea why this was - they have a very controversial CEO. But yeah, you're just talking about wanting to make people leave your establishment. Well, that makes sort of business sense to me. But why trap people in a plane where they probably want to get out? That does not make sense to me. But what else did you pick up, Sean?

[0:52:09] SF: Yeah. Well, the other one that caught my eye here was a guy who - it basically took Tesla Model 3 crash car components in order to assemble the Tesla Model 3 computer on his desk. And this was motivated by Tesla's bug bounty program. But he had to salvage a bunch of this hardware from these various parts. 

The article kind of goes into detail about how he did this. I guess the car computer has two main parts. There's a media control unit, which is like the MCU. And there's an auto pilot computer, or AP. Then you need power supply, you need a touchscreen. But I think the thing that he ran into the most challenges around was just kind of connecting everything. Because the actual connectors in the car is kind of like these mess of cables, like wiring harness. 

[0:52:59] GV: I'm looking at the photos. Yeah. Crazy amount of cables. Yeah. 

[0:53:01] SF: Yeah. So then he had to figure out how can he actually connect the MCU to the other devices in some way where he could provide the right cabling. Because, essentially, it's hard to find that part because it's either been destroyed or it's just this mass of cable that's not like super useful. 

[0:53:21] GV: Yeah, that does feel quite deep end. Bug bounty, okay, has provided a reason for that. But yeah. Still, that is like you got to be really into this kind of stuff to pick through how on earth this is going to all fit together. Yeah.

Yeah, just a quick one to round out. I think this is interesting. Yeah, the Mac Pro has been discontinued by Apple. 

[0:53:40] SF: No. 

[0:53:40] GV: Yeah, I know. I can still remember going to Apple store in 2012 maybe in New York. And the Mac Pro is this beautiful aluminum box. And it's like that was - if you were any kind of serious industry, especially video or something, if you had a Mac Pro, that meant you were a professional on this. 

But apparently, yeah, Mac Studio is kind of where their head is. Mac Studio. I actually looked this up. I wasn't fully aware of it. But Mac Studio kind of looks like a Mac Mini, but it's just got beefy internals. And it looked like their kind of product lineup was a bit muddled by this point. But they had the Mac Studio, and they still had the Mac Pro. They've decided to officially kill the Mac Pro and just focus on the Mac Studio line. 

So, probably quite a sensible move. But yeah, a little bit sad for some of us that still remember all the - that was the sort of - if you ever wandered in someone's office and they had one, it was kind of cool to just use it and know that this was the most powerful Mac on the planet at that point. 

[0:54:43] SF: Yeah. 

[0:54:44] GV: I guess briefly looking ahead, anything you're thinking might pop up when we're talking next? 

[0:54:50] SF: I kind of already stated my point of view on prediction, but I think we're probably looking at more outages at least. 

[0:54:56] GV: With this extra code and code invalidation problems. Yeah. Yeah, for sure. 

[0:55:01] SF: Definitely something to keep an eye on. 

[0:55:03] GV: Yeah. Yeah, absolutely. My side, I think, again, not to plug here, but it has been interesting the Stripe projects thing. People are asking does that make them like a marketplace in a CLI? I can't give the answer if it's yes, definitely yes or no on that one. I think what we are seeing, and I think we're going to see more of this, are other players coming into this CLI-driven market. 

I think it was sort of surprising to some people Stripe had decided to branch over here. Perception was quite positive. I think we're going to see more of that. I think we're going to see more companies that never would have thought to go back to their CLI and maybe think that that's where they now need to target some product releases, for example. CLIs are kind of getting a moment again especially because of MCP versus CLI. That's actually a whole other topic we should cover maybe next month. But CLI is coming back in a big way. 

[0:55:56] SF: Oh, yeah, CLI, it's crazy. Everything, I think, is now about the CLI. I wonder, I think there's probably a bigger topic conversation around CLI, versus MTV, versus APIs and all this sort of stuff. But yeah, CLI is definitely having a moment right now, which is interesting. 

[0:56:13] GV: Yeah. I'm sure something to that effect will pop up, I think, by our discussion next month. So, thank you again, everyone tuning in to SED News. We hope you have a good April. And we'll catch you next month. 

[0:56:27] SF: Yeah. Thanks everyone for listening. We'll see you next month.

[END]