EPISODE 1794

[INTRODUCTION]

[0:00:00] ANNOUNCER: Anduril is a technology defense company with a focus on drones, computer vision, and other problems related to national security. It's a full-stack company that builds its own hardware and software, which leads to a great many interesting questions about cloud services, engineering workflows, and management. Gokul Subramaniam is Senior Vice President of Engineering for software programs at Anduril Industries. He joins the show to share his knowledge of the national security problem set, how Anduril operates, and what the company has built.

This episode is hosted by Sean Falconer. Check the show notes for more information on Sean's work and where to find him.

[INTERVIEW]

[0:00:51] SF: Gokul, welcome to the show.

[0:00:52] GS: Hey, Sean. Thanks for having me, great to be here.

[0:00:55] SF: Yes, awesome. Actually, I probably should have said welcome back to the show because you were actually here like four years ago.

[0:01:00] GS: That's right. In the middle of COVID, actually, I recorded it from my apartment. So, it's great to be here in a proper studio with the right recording setup.

[0:01:07] SF: Yes, for sure. I think people's home equipment setups have really evolved in the last couple of years because so many people are working from home.

[0:01:14] GS: That's exactly right. I remember slowly upgrading my home office over that year.

[0:01:17] SF: Yes, exactly. Went from like a laptop on your kid's bed to like an actual functioning office.

[0:01:24] GS: You start thinking about your background. What are people going to be staring at? I got to put some posters back there. You got to get the lights beaming at your face. So, yes.

[0:01:32] SF: Yes, I remember I was at Google at the time of the start of the pandemic, and like the Senior Director of Engineering, his first meeting, legitimately, he was sitting on like a child's bed in a child's bedroom. I think, I'm sure it evolved significantly since then.

[0:01:46] GS: Yes. For us, we're such an in-office culture, and now, we completely are back to that in-office mode that we were wholly unprepared. I remember, during that COVID period when the rules hadn't been established completely yet. We had people taking tables from the office and driving them to their houses because they didn't have a home office set up, and taking their monitor from the office back to their home. It was crazy.

[0:02:06] SF: Yes, I went through similar things. How was that transition to bring everybody back to the office?

[0:02:11] GS: I think it's been good. I mean, we're such a tactile organization in the sense of the things we build. You can feel them, you can touch them. Even as a software organization, we fly our stuff at the range regularly. So, I think that it's been much needed for us. We're so collaborative across hardware and software. I will say, though, that COVID period really made us invest in remote operation, remote telemetry, remote management, which enables us to now run test sites out of state, right?

So, when we were recording during COVID, our test site was 45 minutes from the office, but that's just not possible anymore. Our new test site is, we have two, one in Nevada, one in Texas. The one in Texas has one of the largest runways in the United States, the entire United States, and we're going to be flying our airplane next year. So, you just can't ship all your engineers out to the test site. So, we've had to really double down on that remote testing capability.

[0:03:05] SF: I would think that it's probably forced a lot of organizations to sort of upgrade, I guess, both like their asynchronous communication, and how they think about remote work, remote test sites, even organizations where maybe historically, you had to have some sort of field engineer, essentially located a customer site to get them set up. You're going to have to change things significantly in terms of not only how you work, but also, probably the support systems that need to be in place to deploy software remotely and monitor it, essentially.

[0:03:36] GS: A hundred percent true. Yes, it's kind of a platitude now, but necessity is the mother of all invention. When you're basically told - and during COVID, it wasn't like work stopped entirely, but we had a safety protocol. So, it's like, one engineer can go to the site and there will be four test engineers, and that's it. And they have to be separated by this much, and it couldn't have prepared us better for hitting the real world and scaling to these more realistic, more safety conscious government systems.

[0:04:07] SF: Given that you're on the show four years ago, and maybe not everyone listening this had checked out that episode. I do recommend if you are listening, you should go back and listen. But, can you give - before we go too deep into things, can you give a little background on Anduril, and what exactly does the company do?

[0:04:22] GS: Yes, happy to. So, Anduril, we are a defense company, we're a defense technology company. Our thesis is, we want to do for defense what SpaceX effectively did for spaceflight, which is, if you think about the sixties, and seventies, and eighties, defense was the place to be. I mean, there was a real national need, we were in the thick of the Cold War, and all the brightest minds were going into defense.

What we have happened over the last 30 years is many of those minds shifted to working in commercial, whether they went to the big bang companies or otherwise. What Anduril wants to be is a beacon to all of those people to say, there is a place for you to come back and work on these really critical national level problems, problems that only a nation state can tackle, and you can do it in a culture that matches the modern world, and you can do it with access to technology, and you can really push the needle. 

We hope to lift the boat across the industry. I think that the existing defense primes are feeling this as well, and I wish for them to be able to access that talent as well. There's a whole set of companies that are coming in behind us. So, we really, I mean, the broader thesis of Anduril is reigniting that arsenal of democracy because we feel that we're moving into a world of great power competition. That's a bunch of jargon. But if you just look at what's happening in Ukraine, you look at what's happening in Israel, if you look at what's happening in the Middle East, it's very clear that America needs to continue to hold its role in the world.

[0:05:48] SF: How do you go about  appealing to that talent pool. You want to recruit the best engineer or product people, best people across essentially any functional area to be able to have, I guess, like fan-quality talent pool, but at a non-fan company.

[0:06:05] GS: Yes. Actually, it's very interesting because we don't want "fan-quality" talent across the whole company. Actually, one of our biggest recruiting draws is the diversity of talent, and the diversity of opinions across Anduril. So, I'll give you a statistic, 20% of our staff, maybe 25% are veterans. So, they formerly served in the military. I would say, another 30% come from bank companies. I would say, 30% come from traditional defense companies.

I think it's that intersection that really excites people. It's feeling like you get to work across talent, and people that you wouldn't meet otherwise. You get to work across - for a lot of people that I talked to that work in pure SaaS companies, getting to touch hardware is a huge drop to them, and getting to work with the hardware side of the company. For the hardware side, getting to work with bank quality engineers is a huge drop to them. So, I think that's been the key to our success is that intersectionality of disciplines.

[0:06:59] SF: Okay. Then, I want to talk Lattice, and Lattice SDK, and some of the things that you're doing there. So, just to start, like, what exactly is Lattice? How do you typically describe that?

[0:07:10] GS: Yes. There's a philosophical definition, and then there's a very specific technical definition that we can go into. At the highest level, if I just take a step back and think about history, the defense world has been very good at building hardware. You know, I call these appliances. So, if you think about what the defense community has done since the Cold War, they build airplanes, they build submarines, they build battleships. They think from a very hardware-centric viewpoint. The net result of that is that these systems are designed in silos, and they don't talk to each other in any way. They were never designed to talk to each other. It's nobody's fault, everyone did great work. But from the beginning, they were never designed to talk to each other.

What it means is that, if you think about how a modern conflict will evolve, this is a huge Achilles heel for the United States. You could see that the military has been talking about this for literally decades. So, if you think about what Anduril set out to do is inverting that paradigm. So, you build the software as the core from the get go, thinking about every different domain that you'll want to work across, and build a common core. Then, you design the hardware to take advantage of that software from the beginning. The net result of that is, you get huge economies of scale. So, it means that, I don't have to reinvent the same code. You get higher quality, you get a better cost structure, which is really important to our customers. 

Fundamentally, that's what Lattice is. It's a common way we think about how we build systems, whether they're command and control systems, laptops, and servers we're putting all around the world. Or they're robots, whether it's our school bus size submarine that's coming online, and going through sea trials right now. It's our spacecraft that'll be launching next year. They're all sharing a common DNA and a common code base. That means they work together, and that means that we get higher quality.

Then, to get to your point about the release that we're super thrilled to do today, what we realize is, that that shouldn't be locked up to just Anduril. So, we are releasing that Lattice SDK, that capability for anyone to take advantage of, and we have a thoughtful onboarding experience, and we're releasing with a number of partners from the get-go, who are all taking advantage of Lattice to be able to do those sorts of missions. Lattice itself, so if we get into the very technical definition, breaks into a microservice architecture, and I can go into the layers of that architecture. What we're offering is for people to take that a la carte, so they could take different parts that architecture. Whether it's our networking capability, or command and control capability, or autonomy capability, and they can start integrating that into their systems.

[0:09:39] SF: If I'm someone who has some of this essentially legacy hardware, I already have airplanes, and all these other things that military organizations would build up. Can I use Lattice, or do I need to essentially be building with Lattice, and then building with essentially the hardware instantiation of that from the ground up?

[0:09:56] GS: Yes. I mean, we do this all day long. We integrate with legacy systems because the reality is, we knew this from day one, we're super thrilled to see the government adoption of our hardware. But the reality is the vast majority of the kit that the average soldier has access to is not Anduril stuff, at least, not yet. So, we need a way to onboard other third-party systems. So, the short answer to your question is, yes, it's totally possible and there are layers of integration with Lattice that folks can take advantage of. So, if you want to be able to send data into our mesh network, so that that data can route seamlessly wherever it needs to go, taking advantage of Starlink, and mesh networking technology, you can do that. And you can buy a compute module from us, or you could actually use your own compute module, and drop our software on there, and you can be onboarded into the network.

You can go all the way up to, if you build, and this is one of our partners, builds unmanned-surface vehicle, and they want to integrate more deeply into that stack. They can go do that and they can work with us. We can give them recommendations on hardware if there are any changes necessary in order to get that deeper integration.

[0:11:04] SF: What's the actual architecture of Lattice look like, and how do I go about using it to essentially start a project with it?

[0:11:12] GS: Yes, that's a great question. That base layer, you could think of it as, we have a set of OS level primitives that we recommend. You don't have to adopt them, but we have thoroughly tested against them. So, Red Hat operating system, for example, or Prometheus for how we do telemetry. So, those are a base layer of Lattice. Then, if you get into the domain specific layer, you've got three basic components. The first is a networking capability. The second is what we call command and control capability. How do I know where you are and send you basic commands?

Then, the third layer is the autonomy capability. So then, how do I go from sending you commands to letting you move autonomously within the boundaries of a plan or a set of behaviors? You kind of think of that as like Maslow's hierarchy. It's like, I got to be able to talk to you. Then, I got to be able to send you basic commands, move here, fire this missile, or whatever. Then, I got to give you high-level intent, and that's kind of the direction we see technology going. 

[0:12:12] SF: In terms of the commands, in my - essentially, is it sort of a client-server model where I'm sending commands from the server and the client essentially is the piece of hardware that's going to react to the commands? Then, is the client sending information back as well?

[0:12:26] GS: Yes. So, we think about the Lattice stack as two different deployment targets. So, the first target is what we call the node and the second target is what we call the robot. The thing I want you to take away, that the main point here is the node has the human involved. So, it is where the human, you or I would interface into a set of systems, whether they're manned or unmanned. Once you realize that there's a human involved, it creates a huge set of - follow-on, knock on technical decisions. For example, you've got to have a user interface, because a human cannot talk via APIs. If you have a user interface, you've got to have a database that's going to store a ton of data so that a human can rewind, and go forward, and all those sorts of things. So, that's the node.

The other thing we have is called the robot, and that's a different deployment target, kind of different stack that gets composed. Now, the thing that connects that is what we call the mesh, and that's the announcement that we made today. Once you are on our mesh, and the mesh is what's called a Pub/Sub network, it's a publish/subscribe network. Anybody can publish topics on that mesh. Anybody can subscribe to topics on that mesh. So, if you are a node, which is where the humans operate and interact, they can subscribe to a set of topics. I want to know where the submarine is, I want to know where the airplane is, I want to subscribe to the radar data coming off the airplane, and that data will get seamlessly routed back to you.

We do things like, if you think about a typical cellular network, the thing your phone uses, there's a concept of quality of service. So, we will ensure the most important data comes to you first, and then, the second most important data, so on and so forth. That's how an end user or even a partner gets onboarded into the stack. That mesh takes care of things like security, it takes care of things like quality of service, et cetera. So, that's really the fundamental piece of the Lattice stack, that then lets you access all the higher order things.

[0:14:08] SF: In terms of that Pub/Sub, what are you running underneath the hood? Is that something that you rolled your own? Are you using like an existing Pub/Sub provider?

[0:14:16] GS: Yes, this is a core piece of IP for us. We spent a lot of time studying existing providers and we ended up rolling our own. It's one of the few areas we've patented as well. So, it's kind of custom implementation. We do use protocol buffers as the mechanization. It's a serialization mechanization. Then, we use things like, we work with all sorts of open-source authentication systems, right? If you've got PKI, or if you've got OAuth, or any sort of those, we can integrate with them. But the actual networking layer of the networking protocol, that's all custom built by us.

[0:14:49] SF: Since you're using protocol buffers, are you using things like DRPC for communication?

[0:14:55] GS: Yes, precisely. So, you can send data over protocol buffers over the mesh, you can publish data, but then, you can also call commands. If you want to task something, you want a direct ability to expose what I can task, and then be able to leverage that task, and get an act back. That way, we would use something like Google or GRPC.

[0:15:17] SF: Okay. So, you have essentially this data flowing, back and forth between these nodes in the server. Where does this actually run? Are you running these within your own cloud deployments? How does that sort of stuff work?

[0:15:30] GS: Right. One of the fundamental premises that we had to make and design for is that, we cannot assume availability of any cloud stack. So, the entire system is designed to be agnostic to both any individual cloud vendor. If we have them, we'd love to use them. Also, the ability to run on premises. The node itself can go all the way down to the form factor of a laptop, and all the way up to the form factor of a data center, an on-prem data center, or in the cloud. The robots tend to use NVIDIA form factor, the NVIDIA Jetson family of chips, but we're compatible with any sort of embedded form factor that we can run on. That's why I call out that distinction, because you've got very different computer available to you, depending on if you're a robot versus a node.

[0:16:18] SF: And within this kind of like mesh infrastructure where this information is flowing between all these different nodes, is there essentially a component of like AI-based decision-making that's also part of it for supporting things like defense operations? 

[0:16:33] GS: Yes. This is an area that we're all kind of leaning into. The DOD is leaning into this as well, but we're all trying to figure out where do we draw the line, and how do you do this in a safety conscious way? So, there's this concept that is basically, traditional systems have been human in the loop. So, the human is directly inside the decision-making loop. The idea with AI and a lot of the latest generative AI technologies is, how do we move the human from being a blocker inside of the loop to be sitting on the loop. So, they are observing the loop entirely. They can stop at any time, but they are not directly an actor within the loop that they have to do something. So, the closest example I would describe is, if you've got a Tesla, for example, you can put your car in self-driving mode. You are on the loop. The car is going to make a set of decisions, but you need to be watching it extremely closely, and you should be ready to take over at any time. 

So, as it comes to AI, that's the way we're starting to think about it, and we're starting to lean into a lot of new use cases. Your viewers may have seen our announcement with OpenAI, our partnership there, and we also have a partnership with Palantir. So, we're really excited to lean into that. 

Then, the second thing I could say on AI is, what I really think about, this is my personal opinion. If you think about what's happened in the last five to 10 years. Since the advent of transformers, I don't know that the innovation has really been on the model architecture side. There's certainly been new model architectures and things like that. But the real breakthrough has been on the data side, and it's been moving from increasing the volume of data that's been available to train these models. Multiple orders of activity, hundreds, if not thousands X more than we had before. 

How did we do that? We got the labelers out of the loop. So, what we basically did is we created a situation where the data could be self-labeled to give you an example, right? I'm going to look at a Wikipedia article, I'm going to take the first paragraph, feed it to the model, ask it to predict the second paragraph. I don't need a human labeler in the loop, because I can just check, did you guess the second paragraph correctly? By doing that, you can explode the amount of data you have available to train. That's our goal.

We are sitting on top of, we hope, a hugely valuable data set for defense purposes that no one else has access to. Only the US military has access to this data. We want to really treat this data with respect with the right authorities, but start to use this data to start being able to generate the next generation of AI models for our use cases. 

[0:18:55] SF: Yes, I think the primary bottlenecks today with both on the model side, evolution of models, and then also with building AI applications are really like data challenges.

[0:19:06] GS: That's right.

[0:19:07] SF: Essentially, if you're hitting the limits of what you can scrape publicly, then you need to go find data somewhere else. Then, there's challenges around generating purely synthetic data for training in terms of degradation of model performance and integrity. Then, even on the other side of essentially building AI applications, during prompt assembly, we now have the advantage of really large context windows where - but then, it's like, how do you provide the right context with the right information in real time to generate the most reliable response. Both of those things are really around essentially data problems.

[0:19:41] GS: Yes. Everything has followed from, how do we get access to high-quality, huge volumes of data. Then, you figure out like, "Okay. Well, I need to train against these huge volumes of data, so I need these huge server centers. I need to figure out how to network the server centers together, and I need to be in a - there's this concept of coherence when you're training. But all of that, a second order that follows from, can you get access to these huge volumes of automatically labeled data effectively?

The big idea that we're after is how do we do that for our use cases, this national security problem set, where we think we are closest to those problems. As we talked about with our mesh networking architecture, we're just constantly sucking up that sort of data.

[0:20:21] SF: Even outside of AI, what are some of the engineering challenges you face with trying to scale Lattice to handle the complexity of these dynamic environments, where you got classified servers, battlefields. You probably know all the terms better than me, but I mean, these are not the typical places that most engineering organizations are deploying things.

[0:20:43] GS: Yes. I'm so glad you asked this question. I think this is what gets people fired up when they come work here. And they see the scale of our problems very differently than what you'll see in big tech. So, I think one of the fundamental building blocks for us is we build for a decentralized world. So, we have no concept of all the data. There's not going to be one server to rule them all, where all the data will come back. If I go back to my Tesla analogy, every Tesla car can communicate back to Tesla headquarters. and they can kind of pull all that data together.

We live in a world where our systems have to be designed to work against degradation of communication and reforming these meshes wherever they are. Maybe they can connect back to Washington, D.C. Chances are, they can't in the real world and they'll have to work in a dynamically evolving kind of decentralized paradigm. What that means is, the way you design applications has to be designed with that in mind.

Let me give you an example, right? If someone sends you a text message on your phone and your phone is off, you have an expectation that when you turn your phone back on, that message will be received and you'll be able to see it. Many DOD systems are not architected with that in mind. You cannot make that fundamental assumption. Furthermore, you can't make the assumption that that DOD system can ever see Washington, D.C. It may be able to see another DOD system if the bad guys aren't jamming it. 

How do you architect and build applications for that world? That kind of becomes our seminal challenge. If you think about the Lattice SDK that we're releasing, it contains those lessons learned that we've spent seven years learning those lessons, about how to build for the decentralized world, and we're trying to share those lessons. It's a different paradigm fundamentally.

[0:22:24] SF: How do you test for those types of environments?

[0:22:26] GS: Yes. So, I mean, this is where the real-world test range comes is incredibly critical. We have a really robust simulation architecture that we're continuing to lean into. One of these ideas is that we can't have one central team to do it all. So, we have people building the architecture. So. if you're interested in that kind of work, you can come here and work on that. But then, we actually have people embedding close to these problems, really getting in the weeds with the end users, often traveling to the end users, working with them when they do their testing. Remember, I mean, we've got to build our stuff so that we're not there when it's actually being used. We train with our end users, and learning from them what those edge cases are, and then we bake it back into our integration tests and our simulation environments.

[0:23:09] SF: What's the simulation environment? Is that something that you had to build yourself in order to essentially simulate the types of situations that the end users essentially going to be in?

[0:23:19] GS: Yes. I would say, we've got multiple layers of this. We have a system called the software integration environment. It's a virtual machine-based thing where we can spin up. We talk about these complex scenarios, battleships in the water, airplanes in the sky, nodes on the ground. We can actually spin up VMs to replicate that. We can replicate the networking links. We can degrade those networks on demand to put ourselves through those paces.

Then, what we've built is really a game engine that can start to describe these scenarios, and we can model at varying levels of fidelity the systems and how they'll behave. Then, what's really cool is we can put into that game engine the real code, the real autonomy code, the real C2 code that would run live, and we can put it through its paces, such that you can build confidence. That's actually really how we train our operators, is they work through the system in sim, but they have the confidence that it's running the real code. So, the real vehicle will behave no differently.

[0:24:12] SF: How reliable is sort of this simulation environment in terms of - if something goes wrong there, are you pretty sure that it would be something that would actually go wrong in sort of the real world?

[0:24:23] GS: Yes. The way I would describe it is, if something goes wrong in the sim, it will definitely go wrong in the real world. But if something goes wrong in the real world, it's not guaranteed you're going to see it in the sim. It's an enormously long tail. I've heard Elon described this as, if you could perfectly simulate the real world, it would - it may be proof that we ourselves live in a simulation. So, like, it may not be actually possible to simulate the real world entirely, and we don't try and like chase that long tail. What we do is we capture the 80:20 rule. We capture the 80 % in the sim, and then. we know when we go to real world testing, we're not wasting our time learning those silly lessons, and then we're actually getting into the details of what we can only find out in the real world.

[0:25:04] SF: Given the sensitive nature of defense data, what are some of the unique security challenges that you run into in terms of having to safeguard this data as it's flowing through the mesh? 

[0:25:16] GS: Yes. I think the DOD has done a really good job of thinking through. There's historical context, decades of it of classification levels, how to run a secure network. That's NSA's job to write that policy fundamentally. I think for us, the challenges then become, how do you operate in that world?

Let me give you an example. I've got systems that are in the Middle East that are defending U.S. bases today. Those systems are seeing data from adversaries that are trying to do nefarious things. How do I retrain the model for those systems based on learnings that we're getting in the field. Do I have to push my model training architecture all the way out to the U.S. base in the Middle East where that data is being collected? I sure hope not, because then, I've got to get NVIDIA GPUs and a whole server set up out into the Middle East. That's probably not feasible.

Can I get it back to Anduril headquarters in a way that every Anduril engineer can see? Absolutely not. That is not permissive and that would be not what your customer would allow. So, there's a middle ground we've got to find somewhere. A lot of this gets into our secret sauce of how we operate and how we work, that we've got to figure out, okay, we've got to do training fundamentally at the edge. Not everyone's going to be able to see all the data. That infrastructure has to be portable, that's why we don't depend on any cloud provider. Those sorts of things become a lot of the challenges we have to work through.

[0:26:33] SF: When you're doing something like training at the edge, does that end up getting fed back to a central location? 

[0:26:40] GS: If it can be, right? We brokered interagency agreements where data collected from one part of the DOD or one agency can be shared with another, but those have to be brokered point to point, case by case. And then, we can do that sharing. But in other cases, the data may never ever come back, and that's a good thing. That's the nature of the work we do.

[0:27:03] SF: A lot of limitations you got to sort of try to try to navigate.

[0:27:05] GS: Yes. Then, the next question is like, "All right. Well, I've got an upgrade to the model training architecture. How do I push it? How do I do fleet management? Fleet management is a huge thing we think about, and it's going to be an even bigger concept for us as we increasingly proliferate with more and more integral systems. So, how do I know the last time I saw a robot? What was the version of Lattice it was running or the errors through? How do I proactively remediate those issues and how do I push forward the latest upgrades?

We've got this concept where we can stage the upgrade forward, like we can stage it at a node. And then, if a robot connects back with that node, we can say, "Hey, we've got an upgrade for you. I'm going to send it on over." So, there's a lot of technology we've built to do those sorts of things.

[0:27:45] SF: How often do you have to push upgrades?

[0:27:48] GS: We're pushing upgrades all the time. And again, it comes down to our rate limiters is working closely with our customer and their comfort level. So, if we have a system that's in R&D, then we'll be pushing upgrades very quickly because both us and the customer understand that we're in R&D mode. If we have a system that's operational, these are life-preserving systems, we go through an extremely rigorous set of tests. Often, it's a real-world test at a test range, it's got an operational test. It's very rigorous for when that system, that code is allowed to go forward. These are some of like the real material differences between  commercial world and DOD that we have to build for. 

We have a concept of named releases, just like Linux does, where we will support that release for a long period of time. We're making that commitment to a customer. If there are bug issues found, we pass that issue, we go back and patch the issue in the release, and then we'll forward deploy that bug fix with the customer's concurrence.

[0:28:44] SF: Do you have to also take into consideration, if something's operational, the time it might take to upgrade, and if there's a reboot, restart process that could impact essentially the person who's using that thing?

[0:28:58] GS: Yes, exactly. So, if I think about one of our sites, we'll have to coordinate with them when the right time to do it is, and so, they'll work that into their shift essentially. So, we'll build a base. Then, we have to basically have operator training, right? So, if we're sending out new updates, that's materially changed the way the system is used. We're printing out sheets, we're giving it to the users, we're having trainers go out there to help them get familiar with the new changes. We did a massive overhaul of our command-and-control interface.

This is one of the cool things about working with Anduril for our customers, is because they don't have to pay for this, they'll get this for free. But what we had to do for one of our systems is we had to run both the old system, the old C2 system, and the new user interface side by side, and the user had to be able to use both until they were comfortable switching over to the new one. 

[0:29:45] SF: Yes, I mean, I get frustrated enough when I have to restart my Slack and Zoom.

[0:29:50] GS: All the buttons are moved.

[0:29:52] SF: Yes, but that's how -- my life depends on essentially.

[0:29:57] GS: Yeah, exactly.

[0:29:57] SF: So, you're now releasing the Lattice SDK. I guess, what was the motivation behind watching that and what are some of the things that I could do with it?

[0:30:05] GS: The motivation for launching that, I mean, there's two fundamental problems we're trying to solve. The first one that we mentioned at the top of the podcast, which is that, traditionally DOD systems are not built with interoperability in mind. They are built in silos, they're built as, "I'm building this system, and then I'll think about how to connect it later." The solution to that is not everyone has to coordinate at all times. That's a crazy thing. The DOD is building thousands of systems at any given time. The idea that we're all going to talk constantly is not going to work. 

Our belief is, the real answer is, let's put forward a reference, let's put forward a specification. We basically say, if you adopt the specification, you get compatibility for free. We will keep the specification alive; we'll keep it modern; we'll incorporate all the feedback. That basically decentralizes this problem of how do we get compatibility. So now that we have the ability to be
compatible, how do we move data between each other? That's where the mesh comes in, and that's the second part of what we're releasing.

To answer your question about, what you can do with it, if you are a builder of a robot, you can get compatibility with everyone else in the ecosystem, and ultimately make your robot more useful. That's what we've seen with a lot of the launch partners that were coming out, and they can make their system more useful, their system more applicable to problems that our customer wants to solve by adopting this technology. What they get with that is compatibility, and the way to send in networking, the way to move data between each other. That's what they'll be able to do.

[0:31:32] SF: Yes. You basically solving like the interop problem, rather than around sort of standardization of how you're going to do interop.

[0:31:40] GS: That's right, yes. We really looked at and learned from, how did the web world deal with this. How did the browsers become interoperable? How did the TCP/IP stack, USB stack, how were all those things work? Like plug and play. I went back and watched the old plug-and-play videos from Microsoft, where they announced that people like, their minds were blown when the printer plugged in, and all of a sudden it just worked. You're like, "I can't believe that's what the nineties were like."

[0:32:02] SF: Yes. I'm old enough to remember those days.

[0:32:05] GS: Do you remember that blue screen? Actually, I went back and watched that video, the first time they plugged it into blue screen Windows.

[0:32:11] SF: Yes, the blue screen of death, amazing. In terms of being able to bring this like Lattice SDK is something that, someone external to Anduril can use. What were some of the challenges with sort of, essentially, packaging that up? It's different to have something that you're using and you built internally, everybody knows how to use, and you're deploying against versus, here's this thing that someone else that we never talked to is going to now roll with. 

[0:32:33] GS: That's right. I think we're still on that journey. Truth in advertising, Sean. We are still early in that journey. But the first part is realizing, no privileged access. So, you can't backdoor and get access to your favorite engineer on the other team to answer your questions. We've got to start working through the doc site. And if the docs don't have the answer, we've got to submit that as an issue and fix the doc site. 

I mean, I think there's a famous Jeff Bezos memo from 20 years ago where Amazon had to go through this to build fundamentally what turned out to be AWS. I think we are going through that journey right now. We're lucky and that we're launching with a set of launch partners that we have trusted relationships with. As we onboard new partners in the ecosystem, we're going to have, second-party, basically relationships with, but our goal is to get this to the point where anybody can take it off the shelf and start using it.

[0:33:26] SF: That's the Lattice Partner program? 

[0:33:27] GS: That's the Lattice Partner Program exactly.

[0:33:29] SF: Sort of the main organizational benefit there is feedback for how you continue to evolve those products.

[0:33:36] GS: Yes. So, the partners get a bunch of different things. One is first-class support and kind of a privileged relationship with our team. The second is access to our simulation infrastructure. So, we'll be spinning up stacks for them, they'll be able to use our sim to integrate their systems in and be able to go back and forth with us. The third is basically business development, a privileged business development relationship with us.

[0:33:57] SF: Is this primarily targeting other companies that are working on defense products or could this also be other companies that happen to be working on other different types of products?

[0:34:07] GS: We're a defense technology company at the end of the day. So, our mission is defense. But I'll tell you, one of the other hats I wear, Sean, is I run the space business at Anduril as well. One of the things I'm super excited about, I call it the team of super friends that we're putting together in space, is there are a ton of companies that are commercial space companies. Actually, two of our launch partners, Apex Space and Impulse Space are commercial space companies that they may not have the expertise in the defense market that we do. But by coming into our partner program, they are benefiting from our expertise and they are integrating with our system to make their commercial offerings more readily available to the DOD. I think that's a huge benefit to a ton of companies that are defense-adjacent, dual-use, have commercial use case, and a defense use case. Maybe offer a component, but not the whole system. Coming into the ecosystem makes them more available to the defense market. 

[0:34:57] SF: Yes. We were talking about some of the engineering challenges of remote deployment in these defense scenarios. I mean, they get even more complicated when you're talking about remote deployment to something that's interstellar.

[0:35:07] GS: Yes. Luckily, I mean, the space community has been dealing with things interstellar for a long time, but how does that then plug back into the DOD's Link 16 network, for example? That's not a thing that a commercial space company is going to want to concern itself with. That's something like, "You come into our mesh, we get to care that for you."

[0:35:25] SF: Can you talk a little bit about the Anduril's involvement in Desert Guardian and the Valiant Shield military exercises? What was your role there?

[0:35:34] GS: I'll take them one at a time. So, Desert Guardian was a tremendous opportunity for us, where we had a number of vendors that all produced distinct systems. So, think radars, cameras, radios that could detect electromagnetic signatures. SOCOM, our customer brought them all to a site and said, "We want all this to work together." The way we're going to get this is, instead of issuing two-year long contracts, where we specify the requirements up front, and no one really knows what's going to happen, and you ask for something two years down the line, you get something different. You guys are all just going to basically run a hackathon, and you're going to use the Lattice mesh, and the Lattice SDK, and be able to generate interoperability.

That's what Desert Guardian was all about. We learned a ton. I wouldn't say everything worked perfectly, but everyone came out with the result intended, which was, system started working together, faster than anyone ever thought possible. I think that was kind of the first kind of set of proof points to us that like, "Hey, this is really valuable, what we're doing." What we've done - every Anduril system, when it comes off the line works together. It's not like, I'm like centrally coordinating with all these disparate teams. They just know to adopt the software, and then you get the interoperability. That's what we started creating now with third parties, who we never met before.

Valiant Shield, very different story. So, the DOD runs huge exercises, where they are training themselves. Really, the purpose of these exercises is to go through the braces and train for these hugely complex scenarios. What would an operation in the Pacific theater look like that you can't do through sim, and it's literally working people through their paces? Valiant Shield, our participation in that, we acted as an integration interoperability layer, to enable systems, to communicate with each other, to enable that exercise to take place.

[0:37:23] SF: In terms of your engineering teams and when people are brought into the company, and say they come from a commercial world, what is one of the biggest surprises that they run into when trying to adapt what they know from the commercial world and engineering practices there, to working in this defense-centric world? 

[0:37:42] GS: Yes, great question. Let me give you a few different answers. The first one is, if you think about the commercial world, you've got a set of economic relationships that don't exist in the defense world. First of all, there's a profit-seeking relationship where, typically, a commercial company has something that's generating a lot of money. I call it like something that's generating a geyser of cash, and you're trying to pump that geyser.

[0:38:03] SF: Ideally, anyway.

[0:38:03] GS: Ideally, yes, and you're just trying to pump that geyser harder, and harder, and harder. That's not how the defense world works. So, a lot of those motions that they're used to, very different from us. Our customer is, we don't have a profit-seeking. Our customer has the budget of a nation behind them. They're really concerned with, are we safe? Are we credibly deterring war? They have a whole different set of considerations. To be noted, the U.S. government is a monopsony, it's the only buyer. So, you've got a very different economic relationship with your customer that takes people a little bit of time to get around. 

You have things like, because safety is such a high concern, you can't just transplant an app store concept from the commercial world, and be like, "Yes, let's just run an app store in DOD." Like, "We don't want that, our customer doesn't want that." You can't have someone like ship an app that causes a weapon system to become no longer operational. This is not okay. So, there's just a level of rigor and a mindset that's very, very different. Then, the other thing is that I think people love about coming here is just the sheer variety of use cases that we're working on.

I mean, I can't think of anywhere you could go right now where you can one day be working on a submarine that's doing sea trials in Australia, and then turn around, and talk to a spacecraft that launched, and then turn back around and talk about sentry towers that are defending U.S. bases. They are just the sheer variety of domains, sheer variety of technologies that we get to work on is very different than a company that kind of just does one thing.

[0:39:37] SF: Because of these added constraints, does a lot of off-the-shelf tools of the trade not work like you talked about rolling your own Pub/Sub? You tried existing Pub/Sub systems out there, but for whatever reason, maybe they didn't meet your specific requirements. Is that the case with a lot of off-the-shelf products?

[0:39:57] GS: The way I disguise this internally is we decompose our stack into what's called a domain-specific layer, and an infrastructure layer. What I say internally is, anything in the infrastructure layer, we better have a darn good reason why we're going to roll our own. So, how we're going to think about storage and databases? You better give me a really good reason why you need to roll your own database, how we're going to think about deployment infrastructure, how we're going to think about telemetry. Those are all things you'll find at any bank company; those are things we're going to use off the shelf, and we're going to make a real clear buy-build tradeoff.

The things that are in the domain specifically are how we think about AI, how we think about networking, how we think about command and control, how we think about autonomy. We are often going to roll our own because it requires us to have a deep partnership and understanding of our mission. That is not easy to see replicated in the commercial world.

[0:40:47] SF: What's next for the Lattice SDK effort?

[0:40:51] GS: I think for us, the key is making sure that we are keeping this evergreen. We are responding to - we want to do product-led growth. You want to be responding to customer feedback, and we're going to start to open up more and more of the Lattice ecosystem. So, I mean, what we've opened up is a very tiny fraction of the Lattice ecosystem because we wanted to ensure quality against that, and we wanted to ensure that it was kind of a hit, people adopted it. If we see that happening, you're going to see us open up more and more. For example, we did not open up any of our autonomy capability, but if we see adoption, we're going to start to be more willing to open up more of a stack.

[0:41:30] SF: Do you think that's going to be like a trend that other companies, sort of in the government and defense space take? I think Palantir has over the last couple of years also kind of taken a similar path where they're opening up a lot. I mean, presumably, you learn a lot of really specific things working within the sector that are generalizable to other companies, use cases, and domains.

[0:41:51] GS: I think you're seeing us and Palantir especially lead the way in this regard. One is, sunshine is the best disinfectant. You're going to learn a humongous amount, just getting people to look at this. The second is, I talk to our customers about this a lot. We are not trying to create vendor lock. The reason we have a core common stack is because it's critical to our cost structure, it's critical to generating interoperability, it's critical to all these good things that we and our customer want, but we're not trying to lock people in. I think vendor lock has been a huge pain point for our customers. 

Even when they look in the commercial world, they're like, "Yes, Apple has this awesome app store, but they basically locked everyone into it, and they're rent-seeking on top of it." There's court cases going on about this. The DOD cannot afford to have this happen. By open-sourcing this stuff, we're showing good faith to say like, "That is not our objective here." I think you are going to see a lot more of this in defense.

[0:42:44] SF: Well, Gokul, thanks so much for coming back on the show.

[0:42:47] GS: Sean, I had a terrific time. Thanks for having me.

[0:42:49] SF: Awesome. Well, cheers.

[END]