EPISODE 1621

[INTRODUCTION]

[0:00:01] ANNOUNCER: GitHub Copilot is an AI tool developed by GitHub and OpenAI to assist software developers by auto-completing code. Copilot kicked off a revolution in software engineering, and AI assistants are now considered essential tools to many developers. Joseph Katsioloudes is a cybersecurity specialist and works at the GitHub Security Lab. He joins the show today to talk about Copilot, the future of software development in an AI world using AI to improve security, and more. Be sure to check out the show notes for a link to Joseph's bio and to the Secure Code Game, which is an in-repo learning experience that Joseph created to teach how to secure a vulnerable code.

This episode of Software Engineering Daily is hosted by Sean Falconer. Check the show notes for more information on Sean's work and where to find him.

[INTERVIEW]

[0:00:59] SF: Joseph, welcome to the show.

[0:01:00] JK: Thanks for having me. It's my pleasure to be here.

[0:01:02] SF: Yeah, thanks so much for being here. We got to meet in person a couple of months ago at Infobip Shift, so it's great to see you again. Let's start off with some basics. Who are you and what do you do?

[0:01:11] JK: For sure. I consider myself a security specialist, who likes to make security easy for developers. My journey started from early teenage years. I had a strong passion for cybersecurity since then. This passion translated into studies, and then I started my career in cybersecurity as a consultant, advising directly, chief information security officers for Fortune 500 companies. I was constantly catching myself missing the software, missing the forefront and shaping the future. I decided to make a shift and be part of a team that is focusing on the security of open-source software.

I think what attracts me to security is the fact that it's so multi-dimensional. There are parts of security that are about cryptography, forensics, intelligence, reverse engineering. My absolute love is for software security.

[0:02:16] SF: Yeah, that's fantastic. I mean, I think that a lot of times, people do bucket security as just like, one thing that's monolith. But there's so many different parts of it. You could be working on a lot. It's like everything in the tech world. When you start to actually learn more about it, the more you realize, “Oh, I don't know anything about this thing. It's way more big and complex than I realize.” In terms of your studies, what did you actually study to, and university that led you initially to a career in cybersecurity?

[0:02:44] JK: In the beginning, I started with computing engineering, so that I had a strong foundation of software. My classmates are working as software engineers. I am the one who decided to make the job into a master's in cybersecurity engineering, so that I could focus on a multitude of domains there for cybersecurity. During the studies, I had the chance to do research. My research focused on cryptography and in general, how I could hack innovations in technology in ways that haven't seen before.

[0:03:22] SF: When you think about your career in security and also what you've been seeing in the space, are we getting better at actually making more secure systems? Or are we always playing catch up to those that are potentially the bad actors that are attacking these systems?

[0:03:37] JK: It's a great question. I'm thinking a lot about that as the year goes. I'm like, hmm, similar question. I concluded that software is evolving exponentially, because it's natural. Technology is the foundation of everything. We interact with technology from the moment we wake up, until the moment we go to bed. It's normal to progress exponentially. Security is indeed playing the catch-up game, growing up linearly.

I think in the past, it's fair to say that security was growing less rapidly, because it was an area that it wasn't so popular. With, of course, high profile breaches and slowly, slowly touching every aspect of our life, people consistently and slowly becoming more privacy aware and more privacy conscious as technology comes to other aspects of our lives. Security becomes, of course, more important, which is natural to become better at it as there's more effort, time and money being invested in security. More skills are being built as a result of that coming from more education.

On the other hand, security will naturally, by definition, follow up from technology. It will be behind it to secure it. This means that there are always going to be a gap there, but it's in our hands to take security more seriously. Instead of having security being an afterthought and having it bolted on later, it's up to us to understand that it's a mindset. Security should start from the beginning and be integrated in every aspect, of course, technology and the software lifecycle.

[0:05:25] SF: Yeah. I mean, I think that makes a ton of sense. It's like, the difference between buying a car that doesn't have seat belts in it and then later after an accident, oh, maybe seatbelts is a good idea. I guess, I should purchase that upgrade to add seatbelts, versus essentially, secure by default, the car comes with the seatbelts built in, so that you don't have to proactively risk, essentially, an accident and then react to it. You have that there to start with. You're starting at a good baseline. Of course, other things could happen where you might have to react to it, but you at least have that baseline level of security and you're thinking about that from the get go.

[0:05:58] JK: Absolutely.

[0:06:00] SF: I want to talk a little bit about the future software development. I feel like, over the past couple of years, there's been a lot of hype around developer experience, developer first companies, community-driven growth, API first, basically all the terms. Now, I think in part, thanks to products like GitHub Copilot, AI-powered developer experiences. It's a dangerous game to predict the future, but the good thing is, if you're wrong, no one really cares. But if you're right, you look like a genius. This could potentially be your genius moment. The big question is, what do you think the future looks like for software development in an AI-powered world?

[0:06:36] JK: It would be very interesting to come back to this podcast a few years later and see how things changed. If we evolved more in other areas, for me, that's fantastic. By the way, I don't mind if I'm wrong, or if it's a genius moment. Everything is progress. We are part of it. Indeed, Sean, we are living in exciting times, where we progress in so many areas. For me, the past year, since last November, feel sometimes that we had more than a year in a year. It's been so much.

I think that, indeed, we are moving into a future of four dimensions. There are four transformative forces there. I see developer first, AI-powered, community-driven, and secure. Developer first, as you mentioned, is more or less thinking that, yes, software is a foundation of everything and it's built by developers. How do we give to these developers the right tools in order to feel more satisfied, be in position to collaborate more effectively with others, and of course, be productive?

When it comes to AI, as you mentioned with Copilot, we are seeing extreme progress. For instance, last year, we had 35% of code being committed to GitHub, being written by Copilot, while now, this number has jumped into 60% for popular languages, like Java, Copilot has hit 1 million paid users in 249 countries and regions around the world.

Then, for communities, community collaboration is something that I feel and leave every day how much it devolves us and brings us to dimensions and research focus as the lab that we haven't thought even before. For example, at GitHub, we host the top 1,000 open-source communities and communities are strong part of the GitHub security lab. I remember in the past three and a half years, more than 25 critical security vulnerabilities being given to us, contributed to us by our community members. We are having the GitHub advisory database, which includes 2,000 community collaborations every single year.

For security, we spoke, we touched a bit before, so I want to touch on a different thing, is that we mentioned four elements here, but it's important to understand the interplace between these four elements. When you have good developer experience and good AI, then naturally, you have better security. Let's just pick, as an example, to better understand that, the feature that GitHub has announced just last month, early November, in the flagship developer conference we have, GitHub Universe.

The feature is called code scanning autofix, and it's all about giving AI-powered suggestions after a pull request has been detected to have a security issue. This is both AI-powered and developer first. But think about the community’s part here. Maybe the problem that has been picked up is coming from the community. It's part of someone, a member of the community who has collaborated this vulnerability to us, and we were able to pick it up.

To sum up, there are, I have mentioned four forces, but they are transformative, and they have interplays between each other, and every part there is helping the other to evolve even more, of course, towards greater software.

[0:10:20] SF: Yeah. I think it's what you started saying there at the beginning about how it's only been a year, but it feels like it's been 10 years. It's hard to believe that ChatGPT just had this one-year anniversary, because there's just had so much impact on the world, as well as things like GitHub Copilot. In the use case that you're talking about where you do a pull request, and then it can alert you about a security issue. Can you walk me through, like what is a use case, or scenario for that? How would that maybe help me detect a potential security risk? Can you give me a specific example?

[0:10:55] JK: Yeah, for sure. As a developer, you can code locally, or you can code online. Wherever you are coding, at some point, you are going to push some code. When you push this code, then in your pipeline, in your CI/CD, which stands for continuous integration, continuous development pipeline, the best practice is to have security tooling that is going to get started automatically when you push code.

Security testing can be static and, of course, dynamic. If we give the example of static security testing, we have the instance of you have pushed the code. This tool is going to read the code and it's going to understand if there are security vulnerabilities there. Before this feature, you will have some alerts that, of course, would be very informative explaining the line of the problem, why it's a problem with references, or where you look for it to understand it and fix it.

To fix it, you will go back into your local environment, code something to fix that, push again, and see if you get the same alert or not. Right now, instead of going back and forth and you are the one writing the fix and trying to understand the problem first, you are going to have the same alert, but then you are going to have an AI-generated suggestion that explains what is the problem, explains why it's a problem, the impact of it, and then you get some PR commit that is going to have a suggestion. Of course, you can modify that, or you can just merge it.

This is very fast, because we just skip the step of going back and pushing again. It's guaranteed that the PR committed there is passing the test that is being failed, the security test that is being failed. This is a productivity gain and you have the AI-generated suggestion right there in the PR. Of course, having others to comment on and improving it.

[0:13:06] SF: With the introduction of a lot of AI tooling around enhancing developer experience and enhancing, or basically, our productivity in a number of different ways, do you think it's helping people enter and stay in a flow state as a developer more so, essentially? Is this a step function in allowing me to stay in it for a longer period of time, or even enter the flow state faster?

[0:13:31] JK: Yes, absolutely. We have data that prove that. I can speak from the enterprise perspective. We have a client called Mercado Libre. It's a very big e-commerce website. By market cap until very recently, they were the biggest company in the area. This client has 13,000 software developers. When I repeat the number 13,000 software developers, 9,000 of which are using Copilot for the past months. They have noticed that they are able to produce software 50% faster than before.

If I phrase it, they are spending 50% less time in the things they were spending time before. They are matching a 100 KPRs every single day. 100,000 pull requests every day. This ties very well with a number that 85% of developers feel that they stay on the flow. They don't need to go back and forth in online, or in other parts. I see this also from the people I'm meeting in conferences, in airports, when they see sometimes the stickers I have on my laptop. They tell me how much they save their mental energy in order to do something that for them is more important, by not doing stuff that are daunting, or repetitive. How is your experience with AI tools? Do you stay more on the flow?

[0:14:59] SF: Yeah. I mean, I think to me, it's akin to some of the enhancements that you had, even historically with IDEs, where if you're coding in Java to just be able to automatically spit out all your getters and setters, I don't have to type all that stuff out. This is a step function in terms of taking that to the next level. Where you mentioned earlier that 65% of code commits in the GitHub are being created through Copilot, is this just changing the focus, or attention as a developer, or is it changing what it might mean to be a developer? If 65% of your code is being auto-generated by an AI, does that somehow reduce the value of my skills, or does it allow me to actually leverage my skills in a new better way, because I can apply them to harder problems that maybe require human attention?

[0:15:53] JK: Quick correction for accuracy that the number is 60% for popular languages, like Java. I believe that it's an integration between the human and the machine here, in the sense that I don't think developers are going to become less capable. They are just going to shift naturally. They are focusing other problems that matter most to them. They are going to be more feature looking, have more time and mental energy for creativity.

I am seeing this in different areas and scenarios. We have conducted a research actually, that we asked 500 US-based developers that are senior developers of experienced 5 and 10 years working for companies of 1,000-plus employees. How are you spending your time right now? Meaning like, what do you do most as developers? 32% of these people were writing code and the other 31% were spending the time fixing vulnerabilities and finding security issues and the 30% remaining was going in communication with users and communication with the team.

They don't really enjoy that middle 31%, which is about finding and fixing vulnerabilities. When we inverted the question that was, how would you use the productivity gains you might have from an AI tool? The response was 45% code reviews, 45% security reviews, which reverse the way of thinking and doing tasks. I see it also from myself as a little anecdote for what I do mostly right now, given that Copilot has taken some aspects of my workflow that I didn't really enjoy doing before they were taking a lot of time. Right now, I'm way more productive in those. I really enjoy that. Every time I see the response, I smile.

[0:17:59] SF: Yeah. In a lot of ways, it's helping increase the enjoyment of your job, because you're taking away some of the things that are maybe less fun about development. If you don't have to spend a lot of time writing all your unit tests and your integration tests, because you can automate that. A lot of developers don't like writing documentation. If you can automate that as well, then you're taking that off someone's plate as well to focus on more complex tasks.

I think there's a lot of potential, even around code optimization as well, because you can use these tools like your brainstorming buddy for how can I make this thing run faster. One of the things that traditionally machine learning is really good at is optimization, because you can try tons and tons of things much faster than a human could try them.

[0:18:41] JK: I agree with all. I mean, I'm learning actually from these optimizations. Sometimes I write the code myself and I'm going to ask the question, how will you improve the speed of that? I'm learning about a new library, or a new methodology that I wasn't taught about that. I didn't see it online and I'm learning on the task right from an AI assistant that I see that someone who is not going to replace my job, but I believe that in the future, a developer that is using that will definitely outperform me in anything.

Right now, I'm sure a lot of people that are security specialists are outperforming me when it comes to writing security exploits, or security penetration testing, pattern recognition use cases with AI.

[0:19:34] SF: Yeah. I think you bring up a good point, like it can be a way to actually up-level your skills, because you can learn from the AI, because it's hard to be an expert in everything. But AI systems generally have a perfect memory and they can cover a lot more breadth of material than you. It might actually introduce you into new concepts that, or new ways of solving problems that you weren't familiar with before.

There's also a nice thing, I think, from a junior developer standpoint, too, where you might feel more comfortable asking an AI system questions that maybe you would feel less comfortable asking a senior engineer and a team, because you don't want to look like you don't know what you're doing, or you look stupid, or something like that. There's a certain amount of psychological safety with actually interacting with the AI, because you'd be like, “I don't know what this thing means. Can you please explain that to me?” But I might not want to ask that in the job setting, because I feel like, “Oh, this person is going to not respect me as much, or respect my skills as much if I need to ask this type of question.”

[0:20:30] JK: Absolutely. I would like to contribute another example that I've seen in that. If someone knows that their code is vulnerable in something, because they have an alert about it, in the same way we spoke before. Of course, you can go online and you can search about this problem and you are going to arrive somewhere. Now, that lacks personalization. No matter of the quality, it’s not going to be about your exact code, the exact variant of the software vulnerability that your code is vulnerable from.

Of course, you can ask. Of course, you can choose not to ask for the reasons you just mentioned. If you ask Copilot, or an AI assistant, explain this to me. The explanation you are going to have is going to be tailored on the code, on the context you give, which is amazing, because you are going to have someone which is there for you 24/7, explaining, getting follow-ups and basically, in a way that as you mentioned, you don't mind too much how they are going to think about me if I'm not so skillful, or if I should know that, and so on.

[0:21:40] SF: Do you think that there's unique challenges in the enterprise world that where AI can, essentially, help large-scale teams in a way that we haven't been able to help them previously? You mentioned the company, America that has I think 30,000 developers all working, and a 100,000 PRs a day, and so forth, the types of challenges that are going to run into as an engineering organization is probably a lot different than necessarily, the five-person start up that's hacking away on a project. If you have run into a problem, you can just tap someone on the shoulder, or send them a Slack message. Harder to do that when you're at 30,000 people.

[0:22:18] JK: It's a great question, Sean. I believe that as an organization is getting bigger and bigger, then things might become slower. With AI, you can bridge this gap when it comes to the speed of developing something, or in general, the amount of research that can be needed to internally agree on something. In general, I like to think about big enterprises as big ships and smaller startups as ships that are very fast moving.

For instance, let's think about vulnerability remediation. I know that I give you a lot of security examples, that comes from my background. As a company gets bigger, systems become more complex. You have naturally more third parties working with you. Supply chain becomes more complicated to manage. In general, the lines of code grow. I believe, with fine tuning of AI models that are offered to customers, there's the chance to have a more personalized approach when it comes to the specific organization.

It can range from styling, to suggestions, to what the libraries are used, to what you avoid. In general, big organizations are expected to have more data to provide in order to fine tune that model, so you have a challenge there, which is to oh, how long does it take to onboard someone when there are super long history of systems, code complexity is high, technical debt is very high? There's an opportunity there for AI to minimize that gap. Of course, some startups that are starting right now can build these problems, that they can build their debt slowly, slowly.

Like you mentioned in the beginning with documentation, when it comes to producing a source of knowledge for people that they can ask and get back responses, this can be more clean, because you can have your AI assistant that is your friend, your senior developer next to you helping with these unique challenges. Of course, every organization is different. Every industry is different. Code is never the same, but there are some patterns out there that are similar for organizations and that is why even Copilot at its generic version, it's a fantastic tool and the numbers we have touched on before are about the version that pretty much everybody in the world is using right now.

[0:25:05] SF: Yeah. I think onboarding and getting someone up to speed and actually productive in a large enterprise as an engineer is a really good use case, because when I worked at Google, we would be basically not do anything for the first six months, because they're just learning how things work there, and you're introducing yourself to a code base that has been around for 20 years. There's a lot to take in. It just takes a long time to get up and feel comfortable, even if you're really experienced.

In any of these large organizations, where you're having 10,000-plus engineers they've been working on something for a long time, just if you can shortcut the time to get up to productivity by even 20%, that's massive savings in terms of the company and also, a much more satisfying experience for the people who are new engineers on the team, because it doesn't feel good to just be sitting there, tweeting on your thumbs, consuming documentation for half a year.

[0:26:01] JK: Now that you have mentioned onboarding, I have another example to contribute, which is it's not AI related, is another thing that we are using internally and our engineering has shifted completely using that. It's called code spaces, and offers the chance to instantly start coding inside a browser. Imagine having a virtual machine. Therefore, you have an editor in your browser. Our client, Duolingo, is having their biggest repo opening in just one minute, which means that if they have someone to onboard and they have to configure environments, or they want to change specific versions, or they have a problem, like something is not responding, they can just restart the VM that is running inside the browser and be super-fast and super productive, cutting the onboarding times not by 20%, by a lot of hundreds of minutes right there.

[0:26:59] SF: That's awesome. I want to talk a bit about security. There's classical attacks that lots of people know about cross-site scripting, SQL injection. Now there's also this growing attack vectors around the open-source supply chain, which you touched on a little bit earlier. In fact, there's a 650% year over year increase to attacks targeting the open-source supply chain. First, what are some of the common security concerns for companies developing software that I think every engineer should be familiar with at the same level as the SQL injection and the cross-site scripting attacks?

[0:27:36] JK: I believe that secrets is a big one. This is because 80% of the data breaches in the past year are attributed to secrets, credentials leaking. It's a big number. In 2022, it's actually higher than it was before. This means that engineers should understand that secrets should be, first of all, generated with a very good source of randomness. But most mistakes happen in how they are used.

Let's offer a few examples for the technical audience listening to us today. Of course, you can have hard-coded credentials. If you are listening to this podcast right now, I give you the, I don’t know, the excuse to pause. Go in your code and do something about that, because it's very dangerous. Hardcoded credentials are being online. They can just be recognized straight away and they can be used.

Another mistake that is not super clear right now is that, let's use this example here. Imagine you have a private key sitting at the top level here that you protect very well. But this private key is going to sign a JIT token sitting in the middle, is an intermediary secret. They are both equally important. I see some engineers that are putting all the security on the top one, because it's a private key, but the second one is an intermediary key right now. It's equally impactful negatively. You should have the same security in both of those, making sure that you rotate them.

For instance, you can have 30 days, 60 days, maximum 90 days of rotations, so that if something is compromised, you can still make sure that credentials are being isolated. Another good hygiene about credentials is to follow the principle of least privilege access, which is to give the minimum amount of accesses/privileges needed. So, that if there is something negative there, then it's again, isolated.

Another big percentage is software vulnerabilities. Just before we touch on the software around though, let's speak about what we do about the secrets. We are having secret scanning. It's a tool that is picking up secrets locally This means that if you try to commit a token, or if you try to commit a password, then it's going to be picked up locally. Your secret is not going to go on the public internet You get the alert there. You are expected to go and remove it.

Again, last month at Universe, we announced secret scanning for generic passwords. Before that, we had some special patterns around 180 patterns. Imagine that. Okay, it's going to be a hex, it's going to look like this. We were very good at picking it up. Last year alone, we picked up 30,000 of those secrets, and they didn't make their way to public internet. But with AI, we can recognize secrets that are having unusual formats. It can be user passwords. This way, they are not going to make their way to production, to internet. Then I hope, next year when I'm going to see the data bridge report, the number is not 80%, but is lower.

If we move to other vulnerabilities, I want to mention here, we still see some software classes, some software vulnerability classes. You mention SQL injection. It's still a thing. It's not eradicated, and it's both surprising and I don't feel nice about that. Because people are expected to know about this by now. It's been 20 years. Instead of trying to educate everybody about every single security vulnerability in a rapidly evolving security world, I believe we should try to have a security mindset, understanding that when we write code, security vulnerabilities can naturally occur as the code progresses, as the code gets bigger there, and it's all about assuming the worst.

No matter how good we think we are, how much training we have, the best practice is to have security tooling that is advanced, being able to pick up these problems before they make their way to production. Some numbers there is that we know that if people are using CI/CD, if they are using code scanning in their CI/CD, they have historically managed to prevent 50% of security vulnerabilities making their way into production.

This number will grow hopefully, as we put security, not shifting left, but starting from left, and as we use AI to help writing secure code from the beginning. Because Copilot is expected to give you secure suggestions. We are taking steps in that direction with the security vulnerability filter, preventing vulnerable suggestions being given back to you.

[0:33:07] SF: Just sticking to the secret side again, recently, I gave a talk where I asked people to put up hands. How many people are still storing passwords in their database? The answer was not zero. There were some people. I didn't ask about secrets. Maybe next time, I will. In terms of secrets, what is your recommendation in terms of how to handle secrets? Someone use, essentially, leveraging a secrets manager, then IAM permissions where only one area of the Lambda function, or one area of the code base can basically talk to the secrets, so you’re reducing, essentially, the attack surfaces. Is that is the best practice that people should be following?

[0:33:43] JK: Yeah. They should have a vault that is protecting their secrets, like HashiCorp for example. HashiCorp Vault. The best practice is to have one person writing one secret and being, give the chance to everybody in the organization. That's a bit bad here. I'm going to explain why it's bad. One person should put the secret in and the people that should have access to that secret should be able to use it, read it, and so on. I corrected myself on the organization, because back to the least privilege access, if we think about repo-level secrets, work-level secrets, then if you give work-level access to everybody, then maybe something is wrong there, because I don't think there is a secret that everybody in an organization should have access to.

Even at repo-level, you can dissect the secret into environments You can have a secret that is for the development environment and a secret that is for the production environment. It's all about the granularity, even when you store the secrets, you should store them in a way that it's as secure as possible.

[0:34:58] SF: Then, how does the security around the CI/CD work? What is it that the code scanning is doing when it's integrated in the CI/CD to prevent those types of vulnerabilities? I'm assuming, it's going to check to see, do I have a hard-coded secret, or a password in my codebase? Is there something that is doing around checking for vulnerabilities, in terms of the libraries that I'm using, known compromised open-source packages, or something like that?

[0:35:23] JK: That's a different check. It's not part of the static application security testing. The way that the SaaS tool works, for instance, if we pick our SaaS tool with this code trail standing for code query language, code trail builds an advanced relative database, which has connections between the code elements there, so that understands where the user data flow, if they ascertain traveling there. These connections are helping to understand where are the sources of the vulnerability, and where are the syncs of the vulnerability. Where are the problems that something starts to be dangerous, and where something is actually executed and it's indeed being dangerous.

When you have that static analysis, you are looking at the code, is an inside approach. When you want to see about packages that are compromised, that are part of your library, and transitive supply chain problems that you might have, we are having dependable there, which is fueled by the GitHub advisory database. The GitHub advisory database is a human-curated database, where we accept community contributions and we make sure that we cross-check everything there.

We expect dependable to submit a pull request, in order to fix the version of the dependency and library you might have, so that your supply chain is not vulnerable anymore. When it comes to transitive dependencies, dependable since the beginning of 2023 is able to help with JavaScript. We chose JavaScript, because it's the ecosystem where dependable could have the most impact. Dependable, both by volume and by acceptances in the PRs has 80% of users in JavaScript. With that transitive relation that dependable is trying to focus on, we have an improvement of 42% eradicating the transitive problems out there for the two quarters that follow the announcement.

[0:37:41] SF: Okay. Then a little while ago, there was this exploit that I read about, where attackers were taking advantage of ChatGPT hallucinations. Essentially, the attacker would prompt ChatGPT for code to solve a particular problem. Sometimes ChatGPT would come back with a reference to a third-party package that doesn't actually exist. Then they would create that package. That way, if an unsuspecting person later generated similar code, the package that shouldn't exist now exists, then essentially, that attacker could have the package do whatever they wanted. It's not exactly what the person thinks it does. On the side of GitHub Copilot, how does GitHub Copilot stop such things happening, in terms of a hallucination, leading to an attack vector like this?

[0:38:29] JK: I think it's normal for every great technology to have dual use. If we think about email, we can use it daily for business communications. At the same time, we can't get fished from an email.

[0:38:40] SF: And spammed.

[0:38:41] JK: Or spam. Yup. What is in our hand to do, and in our control is to make sure that before we provide something in front of our users, we are making the most in order to not provide them with dangers, or potentially harmful results. We are following the exact same ways that we are following internally for other products in order to secure our AI. Of course, there are the responsible AI principles from Microsoft, and here's where our relationship becomes very important. We are following those standards of operating with AI. Of course, as the time progresses, we are also learning we are slowly eradicating those cases.

[0:39:34] SF: Okay. Then in terms of how my code, or potentially my customer data is treated by GitHub Copilot, what are the security and privacy controls built directly into Copilot that make sure that I'm not accidentally leaking my core intellectual property to GitHub?

[0:39:52] JK: That's a great question. When you have a business, or an enterprise license, then GitHub retains nothing. We retain nothing. This means that your context, your code is not being known, is not used to help the model learn more. Your code is not being suggested to other clients, or to other people in general. Even your prompts that you are asking things to Copilot are just used in order to return an answer, a suggestion, and then they are being dropped.

When it comes to the individual licenses, then every user has the ability to obtain, or opt out at any stage from sharing analytics, in the same way that this happens with every other mobile app, any other software in general, since the day I remember me using software.

[0:40:48] SF: Yeah. You can make the choice of contributing back as a way to help improve the product, but it's not something that you're forced to do.

[0:40:55] JK: When you are an individual user, yes. But when you are a business, or an enterprise, you don't even have the chance to contribute back. It's not even an option.

[0:41:04] SF: Yeah, it makes sense. Then what is GitHub Copilot X and how is that different than the non-X version of GitHub Copilot?

[0:41:12] JK: It's a good question. A lot of people thought that it's a product and were asking us, “Oh, when is X coming out?” X is our vision. Back in March, we shared our vision to bring AI in every step of the software development lifecycle. This is exactly what we have done in the beginning of November in GitHub Universe, with more use cases, bringing AI in more and more stages towards, like more productivity, more satisfaction, more enhanced developer experience. And of course, security, as we started to bring also AI to parts of the security products that we have.

[0:41:51] SF: Okay. Then as we start to wrap up, what's next for your work at GitHub and is there anything else you'd like to share?

[0:41:59] JK: I'm very excited about a game that I created in the past six months and it has 3,000 users. It's going great. It accepts community contributions. Through these community contributions, we are going to release a second season, I hope in the beginning of 2024. It's called Secure Code Game, and it's hosted inside GitHub skills. The shortcut for people to find it is gh.io/securecodegame.

I'm excited about that, because it's impactful when it comes to developer education for software security. A problem I was seeing and I was super aspired to solve was that for many developers, security was boring. Oh, another security training and we are going to click next, next, next. We brainstorm internally. We spoke with users and we came up with an experience that looks a lot like what they are doing daily.

We give them code. Code is functional, but is vulnerable, and they have to first, spot the problem and fix the problem When they fix the problem, our exploits are not working anymore. But when they fix the problem in a way that the testing, the unit tests are still passing, it means that they can progress to the next level. In the first season that we released in the beginning of April 2023, we had a lot of levels in Python and a level in C. In the next season, we are going to have JavaScript, GitHub actions, Java and different vulnerabilities every time.

The final reason I love this game is because many people think that it's enough to spot a problem. It's not enough to spot a problem There are more problems that can get introduced to the code if you try to fix a problem, but you don't fix it correctly. I don't want to say big words, but I start to see some results there, and they are very encouraging. I would like to say to anybody, if you have a few hours free during the festive season, or if you want to grow your skill set, just by spending 3-4 hours there, gh.io/securecodegame, or just Google it. You have skills and you can go there and find it.

[0:44:27] SF: Yeah, that's awesome. We'll have to include that in the show notes I think using games as a way to educate people, or even some of the things I've done, where I've been able to combine pop culture with computer science principles, to make it more accessible and more fun is a great way to help bridge the gap of learning, with things that feel traditionally a little bit more, I don’t know, stuffy, or boring, or associated with, “Oh, I don't really want to do that.” But you can make it fun if you put a little bit of effort into it

[0:44:56] JK: Exactly. That's a talk I'm building for next year, actually. How you can use gamification for things that maybe developers don't really enjoy. Try to see some concepts there. Fun is a big one, like you mentioned.

[0:45:10] SF: Well, awesome. I'm looking forward to maybe seeing a talk about this in the future. Joseph, thanks so much for being here and hopefully, we get to see each other sometime again in person at one of these events.

[0:45:20] JK: Thanks for having me. I really enjoyed this pod with you.

[0:45:23] SF: All right. Thank you and cheers.

[END]