[00:00:00] L: Varun, welcome to Software Engineering Daily. 

[00:00:03] V: Thanks for having me, Lee. 

[00:00:05] L: Great. So you call Codeium an AI-assisted autocomplete for programmers. I took that from your website. Tell me a little bit about what that means. 

[00:00:15] V: Yes. So one interesting thing that sort of has happened over the last maybe year or two is generative AI has gotten super popular. But more than being popular, it's become extremely valuable in many, many sort of areas of professional work. One maybe simple area of work that people don't think about very frequently is when writing code. I mean, back in the day, people would use things like IntelliSense and these things that would autocomplete maybe singular words. 

But people have figured out how to harness these deep learning models to effectively complete multiple lines of code. I guess GitHub Copilot is one of the first products that sort of did this, and it's sort of taken the world by storm. Codeium offers this functionality entirely for free and the most IDE as possible. 

[00:01:01] L: So I think that kind of gets to one of my first questions is it sounds like what you have is something very similar to Copilot. But what are the differences and what's – obviously, yours is free. Copilot is not free. Besides that, what makes you unique from Copilot?

[00:01:19] V: Yes. I think maybe one thing that could be helpful is just a little bit of background about like how we got here, just to maybe level set like what we're trying to achieve here. So we started the company Exafunction over two years ago. Initially, I worked at a company called Nuro, where we did large-scale deep learning infrastructure. That’s sort of a team I led. Nuro is an autonomous goods delivery company. 

As you can imagine, AD companies require doing a lot of deep learning, right? If you need to drive a car by itself, it needs to do a lot of detections, predictions, so on and so forth. I sort of started the company with the premise of deep learning was going to revolutionize a lot of different frontiers of technology and build technology to sort of make inference or running deep learning models super-efficient at scale. 

We ended up managing upwards of 10,000 GPUs across the public cloud for any large autonomous vehicle and robotics companies. We realized around a year ago that generative AI was going to be where a majority of these deep learning workloads were going to go. Instead of people hand-tuning a bunch of these models to a bunch of tasks, there were going to be these general models that had knowledge that could solve a bunch of tasks for you. 

Maybe a classic example that some listeners or you might have heard of is these models called BERT that kind of do natural language processing tasks, right? It's interesting that these general models, in some sense, zero shot manner basically without any examples can solve these tasks, which kind of renders kind of why you would want to train specific models to do A, B, and C, if that makes sense. 

So we realized a year ago that we would be able to leverage our technology to extremely cost effectively and at scale run these generative AI applications. So we picked Codeium. We kind of set an ambitious vision to start with. We believed Copilot was just the tip of the iceberg of what these code-assistant technologies could do. Autocompleting code is like exciting. It uplevels developers a ton, but there's a lot of other things developers do, right? Like searching code. They write PRs. They write PR reviews. They execute code and command lines. Because of that, we set an ambitious goal of why don't we first make a product that's entirely free and serve as many users as possible and start out with autocomplete. 

But our vision was to do significantly more that we've already started doing that. So a classic example of a couple of things we've done so far are things like natural language-based search, codebase-aware chat. These are just a couple things and done much more on the enterprise side too. 

[00:03:49] L: Let's hold off on that. I think that's great. I definitely want to get into where you see this whole technology going. But let's go back to your origins here a second too. So you moved from specific models that were dealing with driverless transportation systems and moved to generative models for code developers, software developers. I understand the switch to generative AI. But why this switch from driverless cars to software developers? Why that switch?

[00:04:27] V: No, that makes sense. So I think when we started, we were originally purely an infrastructure company. So we actually provided technology to run these models more efficiently and at scale for a lot of different companies. We actually had customers that were sort of generative AI companies that ran models at GPT-J and all these other like kind of generative models that were open source. 

I think we recognized that a lot of these workloads were going to become generative in the future. Like potentially even like driverless models were going to become generative workloads at some point. Maybe you use – if you look at what Tesla's doing, they're actually using generative models to path plan lane, like what lanes people should actually be driving on. We were asked so much to basically build these sort of generative apps. 

We felt though that the coding space was the space where most of the value was going to be there in the short term, just because developers had already gone so much value from it. The reality was we need to go in deep into the stack and not just provide the infrastructure to run the models. To build a great app, we need to go and train our own models too, and we actually ended up doing that as well. 

[00:05:33] L: Got it. Got it. Yes. The way I see software development is it seems to be a low-hanging fruit, right? It’s an easy way. It’s actually a fairly simple implementation, and I shouldn't say simple implementation. It’s a well understood mechanism for doing generative AI compared to like which lane should I be in in a driverless car, which is a lot more complex problem. But yet the return on investment is huge, right? Because just by doing simple recommendations, you can dramatically speed up the development process of a software developer. So I'm assuming that that feeds into it as you can take what you learn, of course, and apply it to other technologies. But that's where you are first. 

So let's talk about where generative AI in general and Codeium in particular helps with software developers. Now, one of the first things you mentioned is autocompletion, where you're in the middle of writing some routine, and it guesses what you want and helps you figure out rather complex modules of how to do what you're trying to accomplish. That's all great. That's all wonderful. I think anyone who's used Copilot or – I'm a big user of like of like RubyMine for my Ruby development, as well as other tools. I understand how that works, but what else beyond that? Let's talk about where you see the software development AI-assisted space itself moving. 

[00:07:08] V: Yes. So I think that it’s a very interesting question, right? So a couple of the other features that I was mentioning that Codeium sort of provides. Maybe I can talk about a little bit about our growth as a company, just so that you can get a sense of like what sort of happened. We started the year around 1,000 users and now have hundreds of thousands of users using the product daily. So we've grown a ton, and that's because of a couple reasons. One, we've provided the functionality in way more IDEs. 

So like just people don't only write code in Visual Studio. People write code in Eclipse. People write code in Xcode. Our goal is to make sure that the technology is democratized and as many people have access to it as possible. The second thing we've sort of done that even features like Copilot don't have is provide things like natural language-based search. So you can actually write a human-readable English query, and it'll find throughout your codebase where all these sort of events happen inside the code. We pair that also with a chat application that lives entirely in IDE that kind of knows your entire codebase as well. 

All of these in combination allow us to give kind of the sovereign app experience. This is like maybe a term from like the old school days where like you're able to do everything you want entirely within your IDE. We want to uplevel basically every IDE that sort of exists out there. We don't believe like developers really quickly change from one IDE to another. Like you will never be able to convince an EMACS developer to go and use [inaudible 00:08:38]. You will never be able to convince an EMACS developer to go and use VS Code. So we want to give like the best experience to where developers are right now. That also includes the enterprise too where we've done some quite innovative things as well. 

[00:08:52] L: So you mentioned you improved search. Basically, that's the chat conversational style of help me find so-and-so in my code. Can you give me some examples of the types of queries you think that are doable today, either with your product or doable soon with your product? What types of queries can a developer ask and get meaningful results?

[00:09:17] V: Yes. So they should be able to do basic queries like where do we fill out this form for this part of my website. It'll be able to find it and also provide a summary. You can also do things that are really common like let me document my code. We've implemented some interesting things where we kind of just go through your entire codebase, look at where dots exist versus they don't, and make it a simple button click on top of the function to just generate the dot string, which is I guess like all developers obviously love writing code. But some don't really like writing documentation, so we try to make that look super quick. 

[00:09:49] L: Or get commit messages is another one, right?

[00:09:52] V: Generating commit messages from maybe the summary of what changes you've made so far. I think one of the things developers we try to do is try to remove the drudgery of software development because not only are we focused on how do we reduce the number of key presses. We realize that these are not the fun parts of writing software. I guess the key part of this is for technology, we want to sort of deliver. It's technology where like even if it's wrong, it's not something that the user is extremely upset about that it was wrong, which is what made autocomplete such an awesome product. 

This is like a media, like a little bit more of like maybe a philosophical thing about what LM products actually work versus don't. There’s like a whole hallmark or reasoning for what we have internally for why an LM product is good versus bad. 

[00:10:37] L: Yes. You're jumping ahead on me into another question I have coming up. I'm going to hold on to that just for a second, keeping with use cases for a moment. So another thing that developers hate doing, at least a lot of developers hate doing, is generating tests, creating tests. So that's another use case for generative AI. How –

[00:10:58] V: That's actually a really popular one, where people will take a function and it's able to use the context of the remaining codebase to generate a unit test that looks fairly similar to your other unit tests and in the same style of your unit tests and the rest of the codebase, so yeah. I don't know if you've noticed this. But autocomplete itself is like awesome by generating unit tests because it's able to pull in context from the corresponding file where the underlying implementation was, plus get a sense of how the other unit tests in the file of the unit test file sort of worked as well. 

[00:11:29] L: Yes. I think unit tests are a great use case for generative AI because it avoids a lot of the limitations we were talking about, which we'll get to in a minute. But what about integration tests? Do you see a future where integration tests and other more sophisticated testing structures becomes a use case as well?

[00:11:47] V: So I think this comes to maybe one thing that I wanted to get to. Maybe this is like the right time which is what makes LM products kind of work, right? I guess it's kind of important. There are three things I think about for an LM product, right? There's the speed at which the generation happens. There's the quality. Then finally, there's the correctability, okay? Like how quickly can you correct the output of the LM?

I think one thing that's really true about autocomplete, if I was to just maybe break it down for autocomplete, autocomplete is super-fast. It's super-fast. The quality is good, given the amount of text it generates. The correctability is trivial. If it was a bad suggestion, I'm just going to type over it or press escape. It's really, really simple. The tricky part comes if you want to start doing things that are closer to the PR level. I'm going to generate an entire PR. The burden of proof for the product is significantly higher because you're suddenly now making changes on multiple files. 

The sort of cognitive overload of I made up five changes in 10 vials is insanely high if a couple of the files are wrong. Like developers just will not want to use the product, even if it is providing some amount of value. This maybe goes back to a classic thing of it is not sufficient for you to be 50% the way there because you're going to burn your users. Like developer trust is going to get eroded. 

So I think for something like a integration test, the tricky part for these things is there's a lot of time that's needed, even just to validate that the integration test works, right? There might be like you need to deploy to some staging environment in the cloud, and that itself might take tens of minutes. I don't really know. It depends on the actual process. So the burden of proof is it better be right the first time. That's maybe different than a unit test where it's testing a small chunk of code, and it's easily verifiable if it kind of works. Does that sort of make sense?

[00:13:37] L: Yes, it does. It does, especially sensor. One of the things that makes unit tests hard is to figure out I know what this module does. I know what I want to test. But how do I build a frame work to do the testing? That's something that you know it's either correct or not correct when you first see it. It's easy to get it. It's either right or it's wrong. There's no middle ground. So that makes it valuable instantly with – like you say, very easy to recognize when it's not valuable and then throw it away. 

But like you say with integration testing, that's not quite so simple, right? Because you don't know whether it's right or wrong because you're wanting to do the test because of the cognitive load. It can do more cognitive load than a human can, but you can't trust the results because you don't know. There's no way to verify that it's right or wrong. 

That kind of brings me to – one of the things we hear a lot about generative AI in general right now is the general statement like this. AI created beautifully elegant and readable extra code or whatever. That was absolutely dead wrong. It was great code, works wonderfully, but it doesn't do what you want it to do. That's true with AI generative code. It's with AI generative writing. I do a lot of writing, and you can use generative code for introductions but not much more than that and things like that. 

I think you're hitting on it here with this whole thing of you can do unit test but not integration testing. But how do you deal with both the reality that generative code is absolutely certain with what it does, whether it's right or wrong, with the perception factor that goes along with it?

[00:15:28] V: Yes. So there is a couple key things there which I think these are all like really good points. We'll be the first people to sort of tell you that you should not trust everything that comes out of these applications. That once again comes back to the correctability aspect. If it's easier for the user to correct it and validate that the result is correct, it builds more confidence. That’s actually one of the cool things about unit tests, which are something that we will explore in the future which is unit test can actually be run, which is an awesome property. 

So what that actually means is if I generate the code, or maybe let's go the other way. Let's say the unit test has already been generated, and you actually do test-driven development properly. I generated the tests, and now I'm going and generating the code. There's actually an easy way to then run the code to actually validate that it was actually correct. You can build confidence that the generated code is probably correct. Obviously, there's a whole host of other things to think about. Like if you are to run arbitrary code, you better run it probably in a sandbox because you don't want to like rmRF the entire directory. You don't want to delete the entire disk that the user has. 

One thing that we sort of do to try to make sure that results have less hallucinations is try to ground the results. We try to do this like everywhere that we possibly can. If I was to give you two examples of this that could be helpful, context is a very key way that we do this. We apply like context for both chat and autocomplete. You asked us. You asked me a little bit about sort of where we differentiate from Copilot. The amount of context we apply to our models for autocomplete are multiple times higher than what Copilot uses right now. 

Then on top of that, for chat, unlike Copilot, we actually can use the context from the entire codebase to ground the result. Then that's your code for making sure that the generations are more related to the code that you have. So it's kind of structuring the code, rather than letting it guess about what's sort of happening. 

The sort of second thing we do for enterprises which is kind of interesting is we fine-tune the model for their internal codebase. We personalize it for their internal codebase. The reason why we can do that is we actually let them self-host the code, the model itself because we train it entirely in-house. What this lets them do is get the best or both worlds, where they can get a powerful model, but it's also personalized to their internal infrastructure as well. 

[00:17:41] L: That’s cool. Yes. One of the phrases that I'd written down here that I want to talk to you about that we've been talking about already but is the difference between AI-assisted coding and AI-generated code, right ? There's like the using AI to help you build code, where you can verify whether it's right or wrong, et cetera, et cetera. Versus simply using AI to write code for you that doesn't need any additional testing or checking in that, the world of difference between those two. 

Other than what we've just talked about, are there any distinctions between those two ways of thinking about AI and development that's important to what you're doing?

[00:18:29] V: No. It is – I think that's actually a really good distinction. One of the things that popped up and started becoming popular a couple of months ago was this idea of agents, which might be more of the latter thing you just mentioned which was AI-generated code. I think we are quite far. I think it will happen. But we are quite far from these models entirely generating PRs. 

Obviously, there's a capability limitation here too. Like GPT-4, when you actually put it on programming competitions, it performs in like the bottom percentile of sort of programming competition. So it's just not as capable as the best programmers right now. But also, at the same token, the other thing that's sort of missing is the intent aspect of it. Like when I generated a PR, the amount of context that I needed to generate the PR, I have so much state in my head about how this codebase works. These models in terms of raw context only have 32K tokens, which just for perspective for the listeners, is only four files of context. 

But large codebases are like tens of thousands of files. So in my mind, we are quite far from the point at which you can just let these models rip, and they will just generate an entire PR. It's even ignoring the fact that these models don't know. They also don't know all the intent you have of what you're trying to do for a PR. So our perspective here is we should build assisted tools that get closer and closer to AI generation until finally we do generate a PR. But it will kind of be an incremental approach until we get there, and we want to provide a ton of value along the way. 

[00:20:02] L: Cool. That's great. That's great. So that's actually a perspective I don't think a lot of people understand that’s probably worth emphasizing, maybe talking about it a little bit more. You talked about the 4K token, and what that means, and please correct me if I'm saying this wrong in any way, is that the AI essentially can take into account your environment but only a limited locality of the environment, only 4K or a few K worth of knowledge in the environment. So we keep talking about generative AI, understanding everything, and then giving you results. In fact, it may understand everything but it does – it can't utilize everything in solving a particular problem. Solving a particular problem is very localized. 

Now, the way that comes to a coding assistant, like you say, is if it's doing an autocompletion for you within a file, it's understanding a few files around the problem set but not your entire codebase. Is that a true statement, and how does the enterprise level learning, for instance, change the meaning of that?

[00:21:19] V: Yes. So that was a couple of questions there. I think one of the things for autocomplete, the reason why we're so – our autocomplete actual context is significantly more sophisticated than folks like Gopal. We actually do interesting things like embedding search to find a bunch of local places inside your codebase that could be relevant to grab snippets to then do a high-quality generation. The only difference is the burning of proof for autocomplete is significantly lower than generating a correct piece of code with a unit test and a PR that comes out so –

[00:21:50] L: So rather than looking at just a few K of local files, you do a search outwards and finding as much of relevant code as you can and bring it in and use that as your code set to the generation. 

[00:22:02] V: That’s right. That’s right. We have comprehensive eval, plus a lot of users that we are able to get evidence from of like this is working. So, A, if acceptance rates are going up, then we're clearly doing a good job. So it's interesting that for autocomplete, we're actually doing some quite sophisticated stuff in terms of padding the context. 

The only thing, though, that I wanted to mention, though, is it's hard to use that to build something that's always correct, which is sort of the burden of proof when you want to generate a PR, right? It can't just be like kind of working some fracture of the time because I think developers will get annoyed. It can't be the case that I click a button. Then 10 minutes later, I get a PR where most of it is wrong. It's going to be unacceptable, right? It's not going to be used. So I guess that's like the key differentiator. 

One of the things that I guess we're doing on the enterprise side because we let companies self-host the product is we actually let the model kind of be fine-tuned on their private data. Or at least the model is kind of personalized to their private data. So things – so generations are kind of more grounded and semantically understand the rest of the codebase. That's sort of one way that we sort of help with this, but it's still nowhere near the state where you can generate an entire PR that is like it's a massive hurdle, basically. 

[00:23:17] L: Right. Really, what it's learning, though, is the input into the advanced search capability, right? It’s able to answer the question of show me where I'm using this form, or show me where I use forms in general within the iCloud base because I'm trying to create a new form here. It knows how to find that within the codebase, given what it knows about the codebase. But it still comes down to it only collects a certain amount of information before it makes a decision about what it's going to make for a recommendation. 

[00:23:56] V: That's right. For a fine-tuning case, we're able to go a little bit further by actually making it so that the model itself is actually aware of your codebase. It's not just the context is more is better quality. The model itself is kind of more knowledgeable about your codebase, and that's sort of because you're able to tune the model itself beyond just the code it was trained on. 

[00:24:16] L: So what's next for Codeium? Then the related question is what's next for Exafunction?

[00:24:25] V: Yes. So as a company, we're focused on Codeium. Codeium is a product that has a tremendous amount of usage. We want to continue to build products that developers sort of love and use all the time. I guess that's sort of our focus internally. We'll be building out more sort of functionality within the IDE that enables developers to write more and more code. 

So right now, developers are maybe writing singular functions. We want to figure out how can developers write multiple functions in one go. How can they write an entire file? How can they write a commit? How can they finally write a PR? That's going to be a gradual process, right? That is not going to happen all at once. We're going to sort of get there incrementally. 

[00:25:05] L: So do you imagine a freemium model? Or do you really see as a free solo pay for enterprise distinction? How do you see this as being actual monetization strategy?

[00:25:18] V: Yes. So far, we're committed to keeping our product free forever. For companies that want the security and the ability to sort of fine-tune and leverage their entire codebase and sort of pay for the enterprise offering. There's tremendous interest for that so far. Most companies don't want to ship their code outside of the company. Since we train our own models, we're able to give them the magical experience entirely within their own PPC. 

[00:25:42] L: Right. That's cool. That's cool. So what about our Exafunction beyond Codeium? Is Codeium it right now? That's what you're focused on. 

[00:25:54] V: Yes. Codeium is the main product we're focused on right now. That's right. 

[00:25:57] L: Anything else you want to add? Or anything else that I haven't talked about what you guys are doing?

[00:26:04] V: No. I think everything you asked was super reasonable. I think it's like these models are magic. But also, it's good to understand that they're not truly magical. They can't just guess what's in your head and do what you want, right? So you have to be mindful of that. If anyone out there is sort of looking to build a product or a product in the space, think about building products that are reliably good. If you think about it from that perspective, you might have something on your hands. 

[00:26:31] L: What I love about this space, meaning the coding assistant space and what you're doing specifically compared to the generic AI, generative AI space, is that it is grounded and solvable. It's a solvable problem, and it's a problem that's easily understood and recognizable and solved within the AI space, within the generative AI space. 

People are worried about the larger generative AI issues about taking jobs away and doing the writing scripts for movies or building movies and all the very advanced capabilities and what it could do someday to remove. But we're a long ways away from those things actually being reality, and we're a long ways away from some of the vision that people have in general of AI of actually being a useful technology that is reliable. 

I think you hit the nail on the head when you talked about the reliability. You can have a generative AI, generate all sorts of content. It just isn't very useful content. But in the AI-assistant space or AI coding assistant space, I shouldn't say, I think there is a lot of value in the ability to generate ideas which you can either accept or not. It's very easy to transact with the AI and get useful results that are high quality. I think that is driving areas such as AI assistance to be really the place that's going to get the most benefit out of generative AI in the short term. Do you have any comments about that or –

[00:28:16] V: No. It’s actually a really good point. There's always a question, that idea you just mentioned, which is like what's going to happen to jobs for developers. My feeling is in five years, we're going to have more developers writing way more code per developer. The fundamental reason is unlike most other professions, it doesn't seem like the world has sort of a threshold for how much software it can consume. We've only kept increasing the number of software companies. Probably, if there's more high-quality software out there, there are more products that are going to get built. There's no limit, right, unlike most other fields. 

So if we just reduce the barrier to entry for software, kind of just like when people used to write Assembly. Now, after that, people started writing C and then maybe C++ and then maybe higher level languages like Python. All of these reduced the barrier to entry. The people that wrote sort of Assembly, they probably were significantly more leveraged when they started writing C++. All we're going to do is we're just going to get people writing substantially more software and more people in the field as well. So I think it's just a super exciting time to be in. 

[00:29:20] L: I completely agree. Varun Mohan is the CEO and Co-Founder of Codeium. Varun, thank you very much for joining me on Software Engineering Daily. 

[00:29:30] V: Thanks a lot, Lee. 

[END]