[0:00:00] JMC: Hi, Birgitta. Welcome to Software Engineering Daily.

[0:00:02] BB: Hi, Jordy. Thanks for having me.

[0:00:05] JMC: Tell us a bit about your background. You just started a new role at Thoughtworks. But give us a bit of a rundown of your previous experience both at Thoughtworks and in general. And yes, description of what your role is now, because it's really interesting. It's going to drive this conversation.

[0:00:24] BB: Yes sure. So yes, I've been a developer for over 20 years now. For the time of that, that I've been full-time employed, all of the time I've spent in tech consulting. Right now, I work for Thoughtworks. Thoughtworks is a global software consultancy, so we write software for our clients, but also with our clients. Even before that, I work for Accenture, actually, for seven years before I was at Thoughtworks. My current role now, I've been in that role for about six weeks now, has the fancy title of global lead for AI-assisted software delivery. What that basically means is that I get to spend my time talking to a lot of people in Thoughtworks, about what they're currently doing, in particular, of course, all of the generative AI trend and hype that is happening right now. How they are using tools powered by generative AI, but also other types of AI to support the software delivery process. To the coding, of course, it's the most obvious thing that is being talked about a lot, as well. But then, also, beyond that, how can this actually make us more effective, not just faster, but also more effective at software delivery.

[0:01:38] JMC: There's something special about code that makes it an excellent candidate for generative AI. Can you explain why? I mean, it's kind of obvious to me already that anything structured that is open about its own structure, its own definitions, and so forth. It makes that object a perfect candidate to be statistically analyzed, and therefore, generated. But there must be other. Am I missing anything? What makes code, source code in particular so suitable for gen AI, by the way?

[0:02:11] BB: I mean, first of all, I think, also as we're among techies, we also need to be a bit more specific when we say gen AI. So I was talking about AI first and generative AI. But of course, what we're specifically mostly talking about at the moment is large language models, right, as one form of generative AI. There's a few more as well, right? And large language models are really good at pattern recognition, and then kind of like synthesizing, recreating, mimicking those patterns again, right? So they're really good at – to put another way of translating one form of language, or one form of tokens into another. When we're doing the code generation in this context, we're basically translating, transforming natural language into code, right? We're transforming a comment into code or a list of requirements into code, and they're just really good at that. Then, it's also about transforming one, let's say, we need to migrate code from one language into another from one system into another, from one standard into another. 

Wherever we find things where we need to translate one thing into another thing, and it's based on patterns, that's where we can try and see if a large language model can help us do that.

[0:03:19] JMC: Then, if we think about this in – we think about AI in sort of like Venn diagrams, or the biggest, the biggest diagram, the biggest circle would be AI, then within AI, we've got gen. Then within gen AI, a subset of it is large language models. But there are other options within gen AI you're saying?

[0:03:45] BB: Yes, that is my understanding. I'm actually not like a data engineer, or a data scientist, or machine learning expert. I'm in this role as the domain expert for software delivery using these tools. But of course, I'm trying to understand as much as possible, and as much as I need for the role of how this works. But yes, from what I understand, this is kind of how you can break it down. I think like one other form of generative AI that I remember right now would be like things called GAN, adversarial networks, something like that. But please don't nail me down on all this terminology.

[0:04:20] JMC: But yes, the focus of this conversation is actually LLM is applied to software delivery in general. 

[0:04:27] BB: Yes, that's the big change that has happened in the last year or so, that is now triggering all of this amount of hype, and innovation, and energy.

[0:04:37] JMC: Martin Fowler, I'm sure 99% of the audience knows who I'm talking about is a fellow colleague of Birgitta. He's got a splendid books, research, and blog. In it, you've published already, I think, is it three memos already, four memos? There was one –

[0:04:57] BB: It's three. Fourth is on its way today.

[0:05:00] JMC: Oh, nice. Okay. Check all the articles tagged as generative AI in Martin Fowler's blog, because there's just a few. I mean, this is very new, but at least six or seven. One is, Birgitta, which is a collection of memos. It's describing her journey into this new role that she described, and I really liked them. By the way, the latest one is two days old. We're recording this on the third of August. I think the latest one was the first, and another one is coming. The first thing you've done in your new role is actually create a mental model of what is LLMs, what is the definition of LLM as applied to software delivery. Is that right? 

[0:05:43] BB: Yes. I looked at the tool landscape and kind of trying to understand how these things fit together. I'm actually also a member of the group in Thoughtworks that curates the Thoughtworks Technology Radar, which is a pretty popular publication. So maybe some listeners also know about the Technology Radar. As a person in that group, there's so many technologies that get thrown at us by other people in Thoughtworks. We have to put them into – I have to put them into a mental model to understand like what tasks they fulfill how they compare to others, right? This is usually what I do there as well. For these tools, I've also tried to kind of see, okay, there's GitHub Copilot, there's Tabnine, there's Codeium, there's ChatGPT, there's something called GPT engineer, something called Aider. There's so many things, right, so I need to –

[0:06:28] JMC: CodeWhisperer.

[0:06:30] BB: CodeWhisperer.

[0:06:30] JMC: From AWS. There's plenty, yes.

[0:06:31] BB: Yes. Yes. There's like so many things, like I have whole other mural full of tools that I could potentially look at, that are popping up all over the place. The way that I think about them is like, I guess the simplest way that you could do code generation, or more broadly speaking, like coding assistance by LLMs can be multiple things. Not just code generation, it can also be us finding information faster in our context, or it can be explaining code to us or "reasoning" about code with the model. So it can be other tasks as well. The, I guess, the most straightforward way to do that, that probably a lot of people have tried and experimented with is go to a chat interface like ChatGPT. Give it a prompt and say, "I need to build a functionality that looks like this, please generate some code for me in, let's say, TypeScript, or Java, or whatever, right? Or ask the question about how do I install dependencies with Gradle, or something. I don't know.

That's the most straightforward way, right? But for that, you also need to know a little bit about prompting, and maybe, then you need a lot of time describing what your context is, and what your code looks like, and all of that. A lot of the other tools like Copilot, CodeWhisperer, Codeium, Tabnine, and so on, they build kind of a layer between you as the coder, and the large language model in the back end. That's usually not just in the coding space, but also in other spaces with LLM, says like a prompt composition layer or prompt orchestration layer. It's all sometimes called in between what the user is prompting and the actual model. Then these tools apply, like additional logic to how they actually sent the prompt to the model. Let's take the case of tools like Copilot, CodeWhisperer, and Tabnine. Those are tools that are integrated into your IDE. As you type, they are giving you suggestions.

These IDE extensions are that prompt composition, prompt orchestration layer. They take, okay, what's the information before your cursor, after your cursor? They also look at open files. So which files of the same type do you have open, and then they apply some kind of heuristic to enrich the prompt with additional context from your IDE. That then makes it more powerful, because it is directly in my context, from the user experience, perspective. But also, there's a tool that is kind of like trying to enrich the prompt for me, right? That would be another type of tool. Then, those can be in the IDE, or in the command line interface. But these prompt composition layers can also happen for other types of tasks or in other types of contexts, right? We have, for example, some teams that build themselves like a little team assistant, where you also have this prompt composition layer, and the team is actually maintaining it. This application already has kind of hardcoded. This is a description of our architecture, of our tech stack. This is a description of our business context. 

Every time I would use this tool, the prompt would get enriched with those things. Let's say as a product owner, or an iteration manager, I want to write a user story, and I use this tool to kind of go into a back and forth with a large language model that's asking me questions about my user story to help me write a better story, right? Then I don't have to go, "These are the personas. This is what the application looks like." The basics about that, because it's already in the tool. Also, this prompt composition layer has some good prompt engineering practices that I don't have to learn about. So some advanced patterns about how I do this back and forth. It kind of spreads knowledge across the team by having all of this readymade context there, like a description of the architecture, description of the business context. But it also helps spread some prompt engineering skills across the team, because you don't have to know all those things yourself. but it's kind of encoded in this little application.

[0:10:42] JMC: Oh, there was so much to unpack there. Thanks so much. Starting with the beginning about – I was laughing when you mentioned the Thoughtworks Tech Radar, which is a tool that I so much love. I was put up – I mean, one of the curators, I mean, there's plenty. You are one, you mentioned it. But I've bugged so many curators to include the technologies that I used to work for or that I used to like. So yes, you must be swarmed with them. To complete a bit the picture that you said, you mentioned plenty of vendors, plenty of open-source projects. Confusingly enough, you didn't mention the – I mean, no, you were confusing. But Sourcegraph and Google have decided to make this space a bit more crowded, and slightly more confusing by releasing two products, two LLMs for code, code assistance, but called exactly the same Codey. I think Google's called Codey with an E, so it ends with D-E-Y. And Sourcegraph's, it's called Cody with a Y at the end. I could get them wrong, by the way, because they are so similar. Congrats on those. Probably, they're perfectly good products, by the way. Nothing, nothing wrong with them.

But yes, about the specifics that the mental model that you build for yourself, and for anyone to enjoy in the blog post in martin's blog. It's great. I agree with you. One thing came to mind that in this last bit, when you were mentioning team, I can't remember the title or the persona that you were referring to. But those people that build the sort of prompt interface in between, I can see that being really effective to expand, and cross-pollinate, or help with policies, with best practices, maybe with coding best practices, because they are already input into the prompting interface that is being created. Did I understand you correctly? Would that allow for those coding best practices, and the right boilerplate, and applications to be spread out to any anyone new coming to the team, for example, and using that code assistance that has already included these things?

[0:12:57] BB: Yes, that's actually spot on. Because our colleagues in China who came up with this approach, and other ones have been using it the most so far. They always talk about this from the perspective of spreading knowledge, and knowledge sharing, and talking about software delivery as like a heavy knowledge, heavy knowledge work. That this is like a key factor. What they are trying to do with this is spread the domain and technology across the team. I mean, the nice thing is also, when you think about you need like a good concise architecture description, you need a good concise description of your business context. This is something that every team should have anyway, but that we often don't have. 

Now, having a tool like this, giving these things in addition of purpose, you actually create documentation not just for other people to read, but you have an interest in keeping it up to date, because it becomes executable as part of your daily toolchain, right? So that part of knowledge exchange is interesting. Then, the other thing is upskilling to a point when you have more junior people, if you consider that example, again, of writing a story. There might be a person who's pretty new to the job and doesn't have a lot of experience with doing this yet.

The tool might not give them the perfect story after this little conversation. I talked about a bit like the full requirements and everything. But it will give them a leg up on how do you actually write a story, but because it actually spits out scenarios, given when then scenarios, and like different combinations of givens with different then results, right? So it actually then gives you examples exactly in your context and helps you learn how to write a story. That would be another type of knowledge exchange where it's just about the skill of writing a story about or about the skill of – as a developer, how do I break down my tasks for this particular story?

[0:15:01] JMC: Let's go further on in this conversation back to junior developers because you had concerns about how they should interact with these things. You mentioned it in your blog post. By the way, the blog post that we are now mentioning, or talking about is one that is linked from one of Birgitta's memos. It's from a colleague of hers in China, in Thoughtworks China. Basically, what this person does is describe the system, described the tech stack. And it does so in the shape of sort of like constraints. It serves those as the domain for the LLM, and say, you need to stick to this. And then, this person requests a list of things, a list of actions that the LLM will perform to solve the task with the constraints and requirements that have been put in place. Then what this person does is refine, fine-tune, reorder. I'd go through the list of call of actions that the LLM has provided to solve the matter, because most of it is quite frankly, good. But your colleague will reorder them, and refine them, so that he returns them to the LLMs, say, proceed in this way. Then the LLM will produce the results that need to be reviewed again. It's a fantastic approach. Yeah, go ahead.

[0:16:18] BB: I think there's two things here. One is something similar to what is also called chain of thought prompting, that you actually – you don't ask a model to give you a result for a problem that you have like – for a big problem that you have at once, but you kind of go step by step. In his example, he would go – he would first go, give me a list of tasks, how should I implement this step by step? Then you can sanity-check that implementation plan. Only after a few steps, you would maybe now generate some test code from me or something like that. That's one point like this step by step. The other thing is that, it's always used as an assistant, right? Not as like, "Oh, yeah. Now, the whole problem is solved and I just committed to the code base." It's always an assistant. What I really like is this thing of like, it's almost a form of ideation, like maybe pointing me to things that I might not have considered yet.

So if we think about architecture, for example, like I've been very skeptical about using this for architecture decision-making or analyzing architectures, but I haven't actually tried a lot myself yet. But if you think about this not as like, it will give me exactly my architecture analysis. But it would point me to weak spots that I might have, then it could actually be really useful, right? So if you imagine, you describe your architecture, and then you say, "Where are the weak spots?" There's usually so many tradeoffs to consider, and contexts where I'm at a weak spot in one context might be a strength in another, right? So we all know that architecture is full of tradeoffs, right? So then, maybe the LLM will give me five potential weak spots, and three of them are obviously nonsense. I see it immediately. But then there's two where one of them, I'm like, "Oh, that's actually right." There's one where I would be like, "Oh, I hadn't thought about that yet." I should maybe like, "Is that actually a valid question?" That might actually be a weak spot, right? So I really liked this way of thinking about it, not as it will give me the perfect solution. But you know, can you point me in directions that I hadn't thought about yet?

[0:18:22] JMC: You actually – so you haven't tested too much architecture, other than to design architecture, but you have tried it in other areas of the software delivery lifecycle, and quite extensively, you miss it in the blog post. In fact, in terms of ideation, specifically, it has helped you with testing. It has not helped you, with tests, with generating tests in some stances, but it has helped you, sort of like think of ways of designing your tests or discovery. Can you explain on that? Also, your general impression about LLMs for code generation, but you describe in the blog post too, and other areas of software delivery that you might think that you have actually tested, and you have strong opinions about it.

[0:19:06] BB: Maybe for the coding assistants, I mean, I've really enjoyed using this coding assistance that are directly in the IDE, that kind of like just assists me in the way that I would usually write the software anyway, let's say. But that maybe make me a bit faster, or show me ways to solve things that I hadn't thought about. I've used them both for implementing functions, but also for implementing tests. I still remember the first time I tried this, so this was an existing, an existing code base where I wanted to add a feature. The first thing is immediately that it gives me a suggestion for my tests, that just looks like the other tests in my test suite already, right? So just reproduces the pattern from the rest of the test suite. Maybe otherwise, I would have copied and pasted that. So maybe it's not such a big deal, right? But it kind of illustrates how this work, right? Then, I describe my test right. So in this case, this was JavaScript, so you would have like a, it should give me a group. It should group the list by cities or something like that, right? I was actually quite impressed by the test setup that it gave me. Again, it was reproducing the test data setups that the utility methods that I already had in the test. Then, also, sometimes the assertions, the expectations that would generate from me. 

By the way, sometimes this works, sometimes this doesn't. I've had really impressive cases where this has worked, and really cases where I was like, "Yeah, this wasn't really helpful. That's like one of the things I think that is now – that you cannot learn from a training or something like that. You actually have to use it to kind of get a bit of a feeling for it, and also learn to move on when it doesn't work, right? In terms of like, do you use it for tests or implementation? This is like a question that a lot of people have. There was recently a very nice post by Michael Feathers as well, where he was talking about manually writing the test, and then using the coding assistant to help you with the implementation. Because he felt like this is like a – this is the quality control, right? If I manually write a test, then that is my level of quality control. Because otherwise, if I generate the test, and I generate the implementation, then all I'm doing is reviewing the work, as if I'm doing a code review for somebody else, right? But how do I – so I actually have to spend the time on reviewing it?

So yes, sometimes maybe I would prefer with a function to just think from scratch about all the scenarios I have that I want to test, right? If a tool gives me six scenarios, then I would kind of have to think just where are the gaps, which cognitively, I find a little bit more exhausting than actually just like, in a structured way doing it myself. I think it depends on how complex the function is that I'm writing, how experienced I am in the tech stack, how much it matters, is this a POC, or is this like production code? It depends on a lot of factors, I think.

[0:22:14] JMC: For me, you would qualify as a senior developer. So someone that is, yes, is very –

[0:22:19] BB: Thank you. Thank you.

[0:22:20] JMC: Sorry for that. But yes, it's clear. You would certainly look for assistance in this case. But let's go back to the junior developer, I think that this person will find it helpful to incorporate coding best practices of the new workplace that this person has joined and will – if this has been properly fed, and structured, and incorporated into the code suggestion tool, it will certainly help. But yes, this person is lacking this previous experience, and this sort of sharp eye to detect quality, or hallucinations, and stuff like that. Because most vendors out there are claiming that this LLM for code generation – code assistance, I should say, will reduce senior developer oversight on the one hand, and it will accelerate junior developer onboarding, let's call it this way. 

I think you mostly agree with that, but you had a few concerns related to what I just said about this lack of experience, and sharp eye for detecting ports, for example.

[0:23:37] BB: Yes. I definitely think it's a bit more complicated than just saying, "Oh, there's now this thing that can tell somebody exactly how to write a test in JavaScript or something like that." So that's an obvious upscaling that a senior doesn't have to do, right? So yes, do you have to still – you are still in charge as the developer, you are responsible for the code that you're committing in the end, and you have to judge what comes out of this machine. Let's not forget that these models are trained on a lot of code that is out there on the Internet. I think we all know that not all of that code out there on the Internet is perfect afar from it.

The other thing is also that, I was talking before about how these coding systems pull context from other files in your code base, right? But it's your code base perfect, right? It might – it doesn't distinguish between things and other files that are good or that are bad, or things that – where you actually want to start refactoring them, and use a different pattern, right? It will just like amplify everything around you, around your context. It doesn't discriminate between "good" and "bad" things. It can actually also happen that – and then maybe some things, it's really useful for me as a junior because it repeats the patterns that others have been using in the code base, but it also repeats the patterns that maybe you don't want to repeat or that a senior developer decided. In this case, I'm going to do it this way, because I can make an exception here. But usually, you don't want to do it that way, right? I think it's like a bit of a mixed bag.

Again, it depends on the other factors. If it's like a really prevalent tech stack, and it's very straightforward type of code, then the chance of hallucination is also lower. So then, maybe it's not so risky. I think it can definitely help you quickly learn a few of the things and maybe you don't have to go and ask somebody else, right? But there was actually a study done by McKinsey, where one of their findings was that junior developers sometimes even take longer when they're using a coding assistance tool than when they're not using one. So yeah, it depends.

[0:25:54] JMC: But about the fact that it's trained, and open, nobody in available data. Because I've got a follow-up question about compliance, licenses. I know it's a boring topic, but let's take – I want to know if you have come across concerns, especially, I guess, from in this case managers about IP source code being leaked. But let's leave it to the end. Yeah, it is true that if LLMs, especially the larger ones are trained on the Internet, let's put it in a very unprecise way, then the prevalence of JavaScript code will be much bigger in the dataset than, let's say, Lua or whatever. When you've got a colleague that actually posted a blog post recently that you linked from your members, I think, that was quite frankly surprised about the quality of the Rust code suggestions that X given tool gave him. He's also senior and stuff like that. I mean, I think the Rust code base, the global Rust code base is growing quite dramatically. So I wouldn't qualify, I wouldn't describe Rust as a language, such as I don't know, Lisp or [inaudible 0:27:08] or wherever. I think there's plenty of code. But yes, is there a dramatic difference between requesting code suggestion, scope completions, are interacting with code assistant, with AI assistants requesting answers for JavaScript or for minor languages? The difference is so stark?

[0:27:27] BB: I mean, I haven't done like a study and tried like lots of different languages. So I also just have anecdotal evidence, and like Eric's posts about Rust, one of them. He almost described it like he needed to nudge it a little bit, right? But once you have some code yourself, again, the pattern matching also kicks in, and it actually gets a bit better as well. I mean, like one experience I had was, I was using a chat interface. GitHub Copilot also has a chat, and I was using that, and I was asking it to – I was describing the design of my code base to it, and asking it to create a mermaid.js diagram for me. I don't know mermaid.js very well, it's a diagram says code framework. I have actually not used it myself before, so I also did not know the syntax. Then, I asked it to generate a class diagram, and it did that, and I just couldn't get it to render, and I kept like trying to fix it, it just wouldn't work. 

After five minutes or so, I went to the mermaid.js website, and it turned out the syntax was quite different from what it had suggested to me. That's an example of like mermaid.js that will also not be that much stuff out on the Internet. But then, I mean, interestingly, I just copied and pasted the example from the documentation website into the chat that gave the model, the pattern that I was looking for. It actually applied it correctly to my description of the code base from before. With that nudge, again, with like, here's the pattern that I want you to apply. That then actually worked.

[0:29:00] JMC: It's clear that LLMs are great tool for understanding or being trained on structured code. But they are also great when the prompt, the request from the user is also properly structured. We need to learn that proper prompting, or we need to be helped with it. Because plain, syntax-free English, or German, or Spanish is difficult for them. The prompts need to be heavily structured. I came across a study from a few researchers that studied the typical days – what a good day's work would look for a developer at Microsoft. I think they serve 5,000. I can't remember. They didn't segment as far as I remember, between senior developers and junior, which I think would have been interesting because I think it would have shown patterns. But if I'm wrong, it's worked.

In any case, they gave us what a typical developer life looks like. One day in the life of one of these people. It turns out that they don't code that much. They divide it. They code more than eight hours a day, which I mean, probably is not surprising for anyone. They do work more than eight hours a day. I think on average, it was nine and something. Nine and a half. But they divided all the activities of typical developer does into three buckets. So development heavy activities in which code is within. Only 15% of the time devoted to development, and heavy activities is actual coding, which I've – 84 minutes to be precise of one typical day. Bug fixing, testing, blah, blah, blah. Then collaboration heavy is a second bucket, collaboration, heavy activity, meetings, emails, those things that are more iffy for the developer at general. And finally, other activities. Meetings take up 15%. 

Anyway, those were interesting data to me, data points, and so forth. But of the, I guess development heavy activities of coding, bug fixing, testing, specification, reviewing code and documentation. I was wondering, do you feel that any of these this particular suitable for this or any of these is particularly unsuitable for being assisted with an LLM for those? 

[0:30:19] BB: I think so. The examples you gave for development heavy is like coding, bug fixing, testing, writing specifications, and documentation. I think for all of them, maybe in different ways, the tools can help to a point, like we were talking about coding a lot already. Testing, again, you can generate test, you can generate test data, right? I mean, I guess there's tools already out there for generating synthetic test data. But this, again, gives us like a little thing, so we don't even have to set up a big tool. But we can use this chat, or this interface, this coding assistant that we already have in our IDE for all kinds of different tasks. I think that's the power a little bit, even if we now try to do some of the things with it, that we already have tools for. It still – it's kind of nice that it's just like all there, and we can also apply to things that is maybe a new framework that we don't have a tool for yet or something right. Then writing specifications, we were just talking about the example of this, like kind of having a back and forth with an LLM, and finding gaps in our specification, maybe helping somebody learn how to write a story.

Documentation, I find an interesting one, because a lot of people mentioned it, because it seems obvious. Like, oh, this is about summarizing things. LLMs are good at summarizing things, and writing text, and – yes, I don't know. I'm still a bit skeptical because I don't think that maybe the problem with writing documentation is the time that it takes us to write the text. I mean, I could talk about documentation for hours as well. But I imagined like a world where then everybody just keeps generating documentation, but nobody reads it. It's still about figuring out like what is the right level of documentation, the right type of documentation, the thinking about the personas you have for your documentation, and what type of thing you need there.

Also, I mean, there is some promising potential I think about like, we were doing some work with mainframe code bases. That had to be trend like Cobol code that has to be rewritten in Java, right? And there's usually quite a lot of work involved in the reverse engineering, and what does it actually do? We've seen some potential there, and like a large language model also helping with that reverse engineering, understanding that code, and maybe even giving everybody a leg up to transform it into Java code. That would almost be a little bit like – similar to documentation in the sense that you need to "understand" the code. 

With documentation for your classes, maybe let's say you have a code base that has been written by other people, and you want to understand it. Let's say you then ask the model, "Explain to me what this class does." I've actually tried that with one of the classes in my code base, and it was doing a decent job of describing what it is, but it wasn't – if I had just read the code, I would have come to the same conclusion because, I mean, not to toot my own horn, but the class name, and the variables, and the method names were actually quite descriptive.

If you imagine now a class that is not as human readable, it also becomes less readable to an LLM because it's about language. We've also seen that with coding assistance. The more descriptive we make our variable names and function names, the better the suggestions are for what comes next. There's a bit of this like, we can't have this shortcut may be off, explain this code to me that we ourselves cannot read, because the LLM will also not be able to read it or reason about it. 

[0:35:03] JMC: I guess in a way, yes. So LLMs are definitely not computer science majors. They don't understand the structure of code, the logic between that. But they are probably deep readers, deep understanding of the language at a human level than we are, and they are able to find these patterns better than we do, but they still struggle with. That's that.

[0:35:30] BB: Understand as maybe like – that's why I use quote unquote so much. It's like something that I overuse anyway, I guess. But in this case, it's also, they don't really understand that it. It's like a pattern matching that happens. They're very advanced pattern matching that seems like understanding to us, right? We always have to keep that in mind. But in the end, it doesn't matter. Like if they understand it, or if they – just pattern match, or in the end, it matters if it's useful for us, right? That's what a colleague of mine always says to me when I'm skeptical about models, reasoning about things. Is it useful? It's always his question.

[0:36:03] JMC: Yes. Yes. True. I think that's the ultimate question. I think they actually proved to be and will – let's not forget, we're in like the early stages of this thing. I know that researchers on this area have been fighting, and working, and providing developments on this for ages now. But for the consumers, I guess it's – we are in the most incipient ages, and it will just get better in 10 years' time this podcast is recorded and will be completely outdated –

[0:36:32] BB: Yes. Definitely, yes.

[0:36:33] JMC: – to listen to the silly questions that I've been to.

[0:36:38] BB: Maybe, can I just get back to what you were saying about the developer day, and development-heavy activities, collaboration, and other activities? The way some of my colleagues at Thoughtworks look at it, who think a lot about engineering effectiveness, and waste in the process and overhead. Like all of these things, like handovers, and relearning, rework, all of those things. I don't know if you're aware of the seven wastes of software development described by the [inaudible 0:37:07]. They're making like this analogy to the seven wastes of lean manufacturing. Just as a side note, but there's a lot of waste. 

If you think about the categories, like finding information. Not finding information takes a lot of time and overhead, right? Slow feedback on things, cognitive friction, friction and the developer experience. Those are all sources of waste. We're really interested in like, how can these tools help us reduce that waste as well, the waste that sits between all of these activities that we even spell out. And finding information definitely is one that – finding information in context, just as I'm doing a specific task, right? It's like it can be a powerful one feedback, reducing feedback loops by – we were just talking about – I don't have to ask one of the experienced developers on the team. Or I may be, I have this ideation session with the LLM, and I might find things earlier. That's like a really interesting area, I think to see where can we reduce waste.

[0:38:12] JMC: By the way, going back to two things, and this will be – we are heading to the end of the conversation. It will be the developer experience. Because you mentioned at the beginning of our conversation, the case in which I would find this code assistance would be most useful, which would be within deep context, literally within the IDE while I'm coding. Again, let's remember that that's only according to this study, 15% of my time, which is funny and it feels more better.

But you did mention that other or in your research, you found that people will go out within the IDE, and search, and ChatGPT, or the website, and so forth. So I'm interested. Do you have any opinions about where is this LLM for code assistance is best, most used, and why? Is it literally within the idea where they excel, not switching context, or that's not necessarily a strong requirement?

[0:39:15] BB: I think it's not necessarily a strong requirement, but it can help, right? Depending on the task that it's helping you. Whereas, it gives you more boost, and then with others, maybe.

[0:39:25] JMC: What about the commands you run? Does it differ from product to product a lot? Or are the most standardizing upon the ways the user is requesting help for code suggestions, for code completions, but also for chat? Are those in your experience well integrated within the IDE? Does it feel like a natural way of behaving?

[0:39:47] BB: So you mean like how I'm prompting it, yes?

[0:39:51] JMC: Correct. Yes. I mean, when you're writing the code is basically as you type. So you can write a comment, or you write the signature of your function. Then it suggests things for how to implement the function, right? That's relatively intuitive. But also, depending on the tool on the IDE extension, sometimes it works, sometimes it doesn't, sometimes you don't get a suggestion, and you don't understand why, and sometimes it gives you just one-liner, sometimes it gives you multiple lines. It's a bit like hit-and-miss sometimes. I really enjoy having a chat in my IDE for the quick questions. I was recently doing something with Python, and I don't work that frequently with Python.

I had a very simple, stupid question. I think it was something to do with, where do I put an additional dependency. Which file is that or something? Then it's just like a lot faster to do that with the chat that I have directly in the IDE. Then, I'll go into the browser, loading Google or DuckDuckGo, or whatever search engine. Scrolling through the results, scrolling through a long page somewhere. These are also things where I expect the answer also to be pretty good.

By the way, it's like in a way – I mean, Stack Overflow has had a big drop in in user activity. It's actually quite significant. The last time I looked at the numbers, it was already a bit ago, but I had a drop, like a 40%, I think, in the frequent posters on the side, right? This is also like a little bit heartbreaking, because these types of communities like Stack Overflow has always been a pillar of how you learn, and people sharing with each other. 

[0:41:35] JMC: And it was a trust element, right? Because you could see the upvotes of suggested solution. And you could see actually tweaks to that below. So there was a – there's two things that actually was in my mind feeding the next question, which is sort of the trust element. This LLM is proposing this solution, this piece of code, the snipper. But how do I know it's correct? I'm trusting the [inaudible 0:42:00]. While in Stack Overflow, I could trust not only the human proposing it, but also the upvotes, the crowd.

[0:42:10] BB: Yes. I talked a little bit about that in one of the memos on Martin Fowler's side, about where I'm trying to generate a function, and I compared it actually to the upvoted ones on Stack Overflow. But I also sometimes wonder, I mean, if we all start using this now, all right, but the technology workbooks keep turning. There might be new languages, new frameworks. There's like newer versions of frameworks that have quite different API. In a lot of areas, I wonder if at some point, the answers from chats, from LLM chats will get a lot less useful again. So we will have to start going back and asking questions again, and sharing.

I sometimes wonder if we live in this golden age of like, these models are really useful for us right now. But there will be kind of – will have to be kind of like a back and forth. There's also some studies – I think I just read that about image generators. I'm not sure if it's the same for text, but they – how apparently the models that can generate images, they kind of start breaking down if you train them with images that have been generated by them. So if we started cranking out more and more code that has been generated by these tools, will we never be able to have new patterns emerge in these suggestions? Because statistically, what we're doing right now, we're just like, dominate us forever. I don't know. We will see.

[0:43:34] JMC: I think I read the same about code, by the way, not only the images. But on the quality of the suggestions consistently going down, not dramatically, but consistently going down. By the way, I should note that Stack Overflow has released what I understand is an AI code assistant. I can only presume because I've not used it. I think this is literally – this happened literally this week, maybe last week, so at the end of July 2023. I can only presume that the suggestions it makes will also incorporate provenance or some sort of like attribution to the thread, to the page, to the conversation in Stack Overflow in which it was made.

[0:44:18] BB: Is it a coding assistant? Is it a coding assistant or like a chat interface that gives you access to the Stack Overflow knowledge base or also to your hosted own –?

[0:44:26] JMC: I'll be honest, I don't really know.

[0:44:29] BB: Something like that. Look it up, listeners, please. We don't want to say the wrong things about the product.

[0:44:33] JMC: Exactly. Exactly. Check it out, because still, regardless of – as Birgitta just mentioned, the contributions, the activity in the site going down, supposedly because of the increase time, I guess within the IDE, by users receiving this input in their IDE instead of going to the site and so forth. Let's not make a causation there when we don't know. But yes, I think it's good for Stack Overflow, which is a very trustworthy source of good coding practices and so forth. That they are at least working their way out with AI and starting to provide new ways of consuming the information they're using and providing solutions there.

Which reminds me of the thing that I wanted to ask you and this is my last question, actually. Have you come across – this is a bit – a slight twist to the conversation? Models are being trained while being used. Is this a correct assumption? Every time I ask a code assistant for a code suggestion, or I asked her that question in the chat, that information gets fed to the model, the way in which we interact with it. But also, the code, right? You mentioned that most of these tools will take a copy of the lines preceding the request, maybe the whole workspace, maybe other files. So I'd you to delve into what is it that you know it takes a copy of, to use it to suggest a snippet, or a line of code. And I guess, if anyone in your research so far, which is again, very, very short, because he just started in the role. Has told you about concerns about this code remaining in the model, in a model that they don't own and potentially being leaked outside.

[0:46:44] BB: Yeah, definitely. I mean, we have a lot of clients as well who have – yes, serious concerns about data confidentiality in general, or in this case, like code confidentiality. So this is definitely a frequently asked question by every client that we're talking to about this topic. So yes. The prompt, basically, right? The snippets that gets sent from the IDE to the back end, to give you a suggestion. They can potentially by the tool provider, they might be persisted by them, right, and they might be reused for training later. I think this live training that like – it's a bit like muddy waters for me. I'm not 100% sure, but I think this live training is actually quite a – it's not an easy thing, right? But it might be stored for later to train the next version of it, right? When you choose the tool, it's important to look at the providers kind of terms and conditions what they say about this.

A lot of them say that they discard the requests immediately after they give you a suggestion, and that they won't be training you models with your prompts. You should look for those terms and conditions, and then you have to decide if you trust the provider of this tool. Because there's like different levels of trust, maybe you have for GitHub, or for Google, or four Tabnine, versus like a small startup that just started doing this five months ago, right? Then, there's of course, also, the impact of this, thinking about, so let's say probability is reduced, because you've looked at the terms, and conditions, and the provider. But what is the impact of your code showing up in somebody else's suggestions? That can be then your particular, let's say. threat model as a company. Also, how specific your domain is, right? Because the code will also like drown. and even if – again, even if the prompts were used for training, right? This might drown in the sea of all of the other pieces of code that process a list or that build an ecommerce site. 

Usually, with the coding assistance, you also get quite small snippets of suggestions. At which level they're actually like original, or identifiable? or something. That's the question, right? I think the most – there were some stories in the press, like a few months ago about secrets actually showing up and suggestion. So obvious secrets. That's, of course, the one that sounds. It's like a very risky category. That's one –

[0:49:23] JMC: There is a big lawsuit against, I believe, I hope I'm right, but I believe it's Copilot, so GitHub, and I guess Microsoft about Copilot suggesting little chunks of code that is copyrighted in a way that it's not permissive enough for it to be used. Now that the courts will decide if that's fair use or not, but yes, it's legitimately difficult, like you mentioned, to understand the difference between those.

[0:49:55] BB: It does with the secrets. I mean, then there to think again about the impact and the risk. In the tools, you can also control usually to a certain extent which files you want to be readable by the tool. So to make sure that it excludes the files where you might have secrets, then it's a bad practice to have secrets in your code base, anyway, right? So then, it would be about how mature are your practices already. Are you using Secrets Manager and stuff like that? I think the secrets part is the most critical one maybe to consider. Then, also looking at the tool provider, what are they saying about features? Do they have vulnerability filters, or filters for patterns like that in their back end before they actually send you the suggestions? Because that is also something that some of the tools.

[0:50:42] JMC: In fact, I should mention too. I think Copilot actually implemented such thing too. The LLM might not be aware of licensing, and suggests the literal code of unpermissive code. Not code. A snippet of unpermissive code. Then the filter will come in and say, "No, this is not going out. This is a suggestion, because it will break the license." Yes, I mean, there's a whole world in licensing and so forth. You and I are not lawyers, unfortunately enough. But it's really interesting, and it will eventually define a lot the space, but we should focus on the software engineer and software delivery.

[0:51:26] BB: There's the legal side, but there's also like – do we feel okay with that, right? How do we feel about that similar Stack Overflow thing? But yes, that's for a whole other conversation.

[0:51:37] JMC: My last question is like – because we talked about the dev side of DevOps, the dev side of the software delivery lifecycle, but software delivery. We've talked about architecture, the shortcomings, all the LLMs for architecture. We talked about code generation, testing, integration, builds. But my last question to you is, where do you see this – do you see LLMs being applied in deployments, in infrastructure, as code et cetera, et cetera? Is there a – I know that there are several experts in that field in Thoughtworks, and it might not be one of them. But what are your opinions on the appliance of LLMs, to specifically the deployment bit of the software delivery?

[0:52:20] BB: Yes, I was actually just chatting with Keith Morris about this, who wrote the Infrastructure as Code book. I'm going to I'm going to sit down with him next week to get a bit of his insights. But yes, I mean, yes, you can generate infrastructure code with it, of course, right? I've actually heard conflicting observations about this. I've heard some people say, it works especially well with infrastructure code, or Terraform code, and stuff like that, because it's declarative. Others saying, "Oh, for me, it didn't work with infrastructure code at all." But yes, that's same as code generation. You can apply it in that space as well.

I mean, I think in deployment, a lot of what we want to do actually, we need like a solid basis of deterministic automation. Actually, in Thoughtworks for our – we have kind of some first principles or values that we like to apply when we're doing software delivery. One of them is repeatability. Especially in continuous delivery, in pipelines, we want things to be repeatable, so that we can – without cognitive load, and with low risk, and with a feeling of safety, we can just repeat doing things, and we know they're very probably going to work, right? LLMs are not – their strengths is not repeatability determinism. So they will give you different things all the time. I think they are more well used, maybe for things like, I don't know, exploratory testing, chaos engineering, stuff like that. There's some people who are using this type of AI now also to power agents for simulations. I think Microsoft has built a little agent with ChatGPT that can solve tasks in Minecraft, for example. I think maybe for stuff like this, they're better suited. But there should be, I think, for deployment, and continuous delivery. We need like a nice solid basis of repeatable deterministic automation.

[0:54:14] JMC: I honestly think, and this is a conversation that I had with Keith years ago about GitHub, which he's not a big fan of. I am. Since GitHub is basically – it is a system of deployment for especially suited for Kubernetes, maybe for other environments, but I think that mostly it was invented for Kubernetes. It might be applied elsewhere, but it will basically have your system declare, like you said, declare the desired state of the system, is declared in a data format in Git or in any other version control. But let's equate – get to version control because it's the most used. And therefore, if that, following the example of your colleague in the post that we disrupt in the beginning of the conversation. If that is the constraint of any of the proposed solutions that the LLM will give to the prompt that I suggest, then it might actually become slightly more deterministic. Like you said, it's impossible because they are – they are designed to provide a slightly different answers every time.

But either constraint haven't moved, and the requirements of the system are well defined in declarative terms, and GitHub again, defines the system in such a way, and then allows Kubernetes to reconcile that constantly with a production environment with the actual state. So desired state will always mimic the actual state. If there's drift, it will advise of such thing. It gets more complicated, but it's basically at the conceptual level. I think it's a good candidate, but let's see.

[0:55:49] BB: The question is, if it gives us an added value on top of what we already have in that space, and if it's then worth the risk of having it less deterministic.

[0:55:57] JMC: It might not be. Exactly. Exactly.

[0:55:59] BB: These days, I'm not saying impossible to anything at the moment, because the hype out there is pulling us to all kinds of different directions, so let's see.

[0:56:10] JMC: Honestly, I'm very jealous of your title. It sounds fancy, but I say, this is a joke. But in reality, it must be – you must be thrilled too, I'm really jealous. I hope you have a really long career in that role. And yes, then we get together any other time with this technology being much more mature than it is now, and that we have other areas to explore. But so far, it has been a fantastic conversation. I only can tell everyone to read the memos that you've already written in Martin's blog post – blog, apologies, to read the others tagged as generative AI, there's a few others that we've mentioned in this conversation. But yes, I look forward to reading the next one, and so forth. Thank you so much, Birgitta for being with us.

[0:57:00] BB: Thanks, Jodi for the conversation.

[END]