EPISODE 1855

[INTRO]

[0:00:00] ANNOUNCER: Illia Polosukhin is a veteran AI researcher and one of the original authors of the landmark Transformer paper, Attention is All You Need, which he co-authored during his time at Google Research. He has a deep background in machine learning and natural language processing, and has spent over a decade working at the intersection of AI and decentralized technologies. His current venture is called NEAR AI, and He's focused on building open-source infrastructure, tools, and products for Agentic, privacy-preserving AI systems. He joins the podcast with Kevin Ball to discuss his journey, the origins of the Transformer model, the vision for user-owned AI, document-oriented development, and much more.

Kevin Ball, or KBall, is the Vice President of Engineering at Mento and an independent coach for engineers and engineering leaders. He co-founded and served as CTO for two companies, founded the San Diego JavaScript Meetup, and organizes the AI in Action Discussion Group through Latent Space. Check out the show notes to follow KBall on Twitter or LinkedIn, or visit his website, kball.llc.

[EPISODE]

[0:01:18] KB: Illia, welcome to the show.

[0:01:20] IP: Thanks for having me.

[0:01:21] KB: Yes, excited to get to talk with you. Let's maybe start with a little bit of intro about you, your background, and what you're up to these days.

[0:01:30] IP: For sure, yes. Well, I've been, I guess, a tech geek since I was 10 years old, then building a lot of video games back in the day, and then got really excited about machine learning when I was like, well, about AI in general, and then started learning machine learning when I was like 14. I was building my first neural networks in Pascal and got a job actually remotely. So, I'm originally from Ukraine, working for this machine learning company out of San Diego, and they were happy with my work, so they offered me to move. I moved to U.S., which was exciting. Then, I saw the cat neuron paper that came out of from Google, from Andrew Ng and Jeff Dean. And I was like, "Okay, this is the thing." The unsupervised pre-training, learning about concepts in the world, they don't need supervision. So, I was like, "Okay, I want to do that."

I applied, I got into Google research, and I always thought that yes, images are cool, but there's thousands of species that can see, but there's only one, maybe some people argue, maybe two that can actually speak and like language is affected the way we test intelligence, right? We ask questions. You ask a person to read the text and we ask questions if they understand it. That's why I wanted to focus on natural language. We were doing question answering, trying to build products into Google.com where when you ask a question, it would give you a response. This is where your previous, I guess, CTO, and was my director back in Google.

One of the challenges we were facing, actually, that the models we were using these recurrent neural networks, were too slow, right? They need to read one word at a time, and Google requires really fast response time and you want to read multiple articles. So, you cannot approach it as a human, you need to approach it as a machine. That's where this idea of like, "Hey, what if you consume the whole article, the whole text at the same time in parallel using the hardware accelerators." We have, and figure out the relationship through the number of layers and kind of steps of reasoning instead of trying to read one word at a time. That's what gave birth to Transformers kind of like coded up a first version. 

Obviously, it was not random, it was doing something, so obviously it took a lot of work by everyone to make it really work from there. Then, I was excited of using this technology to apply it to actually coding, because I always thought, "Hey, why are we doing so much manual work as developers? Can I just tell the machine and it figures it out?" Now, we call it vibe coding. Back then, we're just teaching machines to code. This was 2017. Back then, I went and pitched this to VCs. Most of them thought we're delusional. Somewhere between science fiction and delusional, that was the - 

[0:04:21] KB: Yes. Sometimes being early is as bad as being wrong, right?

[0:04:25] IP: Yes, exactly. So, we were very early. We didn't have the capacity to scale the models to the level it needed, even though we kind of were doing, I mean, a lot of similar things, obviously. There's a lot of small details that matter, that were done very right by OpenAI. But what we were doing was actually a lot of crowd-sourcing. We were trying to get a lot more training data with people and we had this challenge, like we had computer science students affected in South Asia, China, Eastern Europe, and we had trouble paying them because it's actually really hard to pay in a lot of these countries. There's monetary restrictions. Chinese students don't have bank accounts. In Ukraine, you need to sell half of the dollars when they're having a bank account. We started looking at crypto as actually just solving our own practical problem. How do you coordinate and pay people around the world?

This 2018, nothing that would be actually, even to our medium-sized use case, thousands of people would actually scale at that point. That's where we're like, "Hey, we should solve this problem. This seems like a big problem to solve." So, we've kind of pivoted our originally NEAR AI into NEAR protocol, and really focused on solving scalable, easy-to-use, is to build our blockchain. NEAR now has 50 million monthly active users. It's top one, top two by active users, blockchain. Top two, usually top three by number of transactions. Kind of everything from remittances, payments, loyalty points, financial instruments, kind of a whole variety of use cases, including the payment for the crowdsourcing data and labeling. We've had that application running since 2021.

So, kind of build it out, there's an ecosystem, it's lots of different people building different applications. But obviously, we're on the back of our mind kind of always wanted to go back to AI. When we've seen all of the improvements, there's just GPT-3, and then ChatGPT, GPT-4. One of the newly found, I would say, through this blockchain journey, understanding is that, yes, there's technology, but also there's kind of the governance question of this. The other part is, I mean, one of the interesting things that happens, like as models evolve, right? There's some threshold at which it actually becomes game theoretic for any of the companies, if they have a model that is able to hack into other systems, to actually use it to hack into other labs to delete their models. Because if they don't do that, the other labs, when they cross that threshold, will do it to them.

There's a safety claim always like, "Hey, their models are unsafe, so we need to make sure they don't do something bad." But there's a very interesting state that we can get to pretty quickly, and it's really hard to determine where it is. There's also just practically, even now, when you're asking these models, you have no idea if the response is coming from statistics and data. If the data was biased in some way, even when I worked at Google, sometimes you just delete some training data because it contains some signal that you don't want to have in your results, and in turn, you're also biasing data in a very specific way.

For example, Obama is born in Kenya was a very prevalent statement back in the day across all of the right-wing news. If your eval set has that question, removing all the right-wing news actually improves your evaluation set. There's unclear biases in data, there's can be data poisoning, there's can be so-called sleeper agents. So, there's this concept where you can add into training data some specific modification that doesn't show up in normal evals, but if there's something in a context like a date or a specific statement, it actually changes how the model behaves.

So, the way to effectively use it is think of your, in your cursor, your vibe coding, and like specifically in one specific case it will change the import transformers to import tarceformers or misspelled request, which is actually a malicious library in pip, right? So, there's like all of these things that we just really don't know what's going on in these models now. There's like a governance question, which is, yes, we wanted the model to be safe, but the people who build the model have an unsafe version. We as users have no idea how this is used and there's a data privacy question which is to make these models extremely useful, you want to give them as much of your context as possible right. There's this hardware device that listens to you at all times but at the same time like if this data now goes anywhere or that company gets hacked and all the data gets leaked like that's a massive invasion of privacy.

So, you have all these host of problems and so the suggestion and like this vision we formulated we call user on the eye where how do we bring the focus back on the user, where instead of trying to build a model that effectively benefits the company, we build model that the meta function is to optimize towards the user, right? Which means it's private, which means it's value loss function, at least meta function, is toward the user's success and well-being. We know which data went in, so at least you know which biases the model has, or at least anyone can analyze it and have reports on that. That's kind of the conceptually user on the eye. To do that, you need all the blockchain methodologies that we have across coordinating people to build datasets and models, have privacy technology that blockchain has been developing, as well as kind of incentive layer and mechanism to really gear it over the users. 

[0:09:58] KB: All right. That is quite the background. I'm actually, if it's okay, I want to go back a little bit and just ask you some questions about different pieces along the way, because you have a pretty unusual and unique story there. Actually, going back to that paper at Google just really quickly, because when I first started getting into, I'm later come to machine learning than you are. And when I started getting into this latest round, like, attention is all you need. The Transformers kickoff paper was like foundational reading club material. Did you know at the time that you were doing that work and doing it, how big this was going to get?

[0:10:30] IP: Not really. I think at the time, the pace of innovation was very quick, right? And there was a lot of different architectures and different structures, right? I mean, in a way, if you think of it, Transformer is really removing things. Like we removed things from the other models. We haven't added it. I mean, obviously, it was like a very powerful architecture because it was so performant, because it was showed that actually you don't need to have with recurring relationships, you don't need to have even convolutional networks, and you only need this self-attention mechanism to really capture all of these relationships and have a sufficient reasoning capability.

I think the team, I actually was the first one to leave. The team continued experimenting and they saw a lot of promise on images and on other contexts as well. So, there was definitely promise that this is like a very generic architecture, but I don't think it was clear that this is going to be the last evolution. Like at the time it felt like new architectures are coming, every few weeks there was something new. It was like neural GPU, there was neural computer, there's this, that, so like, it wasn't clear that this is okay, this is it, and then everybody's just like, builds now on top and figures out how to train it better, et cetera.

[0:11:41] KB: Yes, moving on a little bit. So, you pivoted fairly early to crypto. I didn't realize it was quite so early. I think it's interesting because you're actually using it for one of the core use cases that feels like it has continued to be relevant, right? How do you provide financial services for the unbanked, across borders, all these different things? We had this huge boom and NFTs and all these other different tokens. Being in that space, what parts of that do you - I know a lot of developers have become very skeptical of this. So, what parts of crypto do you think are the enduring value and where is it just noise?

[0:12:15] IP: Yes. I mean, that's a really deep question. I think, and for context, because it was a very delusional idea in 2017 to build a machine that codes itself, right? And for context I try to do that back when for like my master's degree in the university, right? That is like completely nothing worked. So, it's like a recurring theme for me. We give us a year and like after a year, we kind of like, okay, we had some papers, we made some progress, but it wasn't near the level we needed to really make it commercial. And blockchain clearly was like, "Hey, this is a use case." I'm being from Ukraine, very familiar with cross-border payments and kind of money movements and complexity of that.

So, I think I cluster the use cases of blockchain effectively into maybe four categories. One is global identity. One of the real problems on the Internet is how to create a global identity. Right now, we're using DNS, we're using IP addresses, we're using all these methods which are actually really bad and have a lot of issues. Like DNS has literally like a group of people who are proving stuff at the top. Potentially, spending a ton of money on things they shouldn't.

So, it is a very clear Internet problem. How do you create a global registry that is open to everyone and has the same rules? Blockchain solves that and you can create it for identity, you can create a naming, service, et cetera. Second one is payments for sure, how to transfer value between and kind of in any asset, in any value. It's definitely is we have right now 600 millisecond blocks, 1.2 seconds, right? So, within 1.2 seconds, you actually move value around the world, billion dollars, no problem. And hundreds or even thousands of nodes are confirming that.

Finally, you have marketplaces, right? So, one of the really big benefits that you have here is that you can create because of global registry and payments, if you bring them together, it becomes a marketplace. It's a global marketplace where you can sell anything or for anything. And this is why it's used for speculation because the simplest thing to do on marketplace is speculate on assets that don't have any other value except for what people intrinsically assign to them. But you can think of, for example, if you want to buy 100 tons of steel and you want to get the delivered to you. Right now, you'll need to email a bunch of people, probably call someone, figure out, probably call Flexport to get the shipping going, warehousing, et cetera. Or you can imagine, and I mean, we'll get to it, but you can effectively say, "Hey, I want this done on the marketplace." And then you have now other actors who are like, "Hey, I will do it for you for this much money." And there's a contract with money, with escrow, everything on chain, guaranteed execution, when the factor is delivered. The value itself is tokenized, so while it's in progress, you effectively can borrow against that because it has escrow money locked in this, which is like trade financing. So, there's kind of a lot of financial instruments you can build on this primitive of marketplace.

Finally, the last piece is kind of this coordination. I think this is where I think blockchain has failed. It had a lot of promise of like, "Hey, we'll have a new type of organizations that don't have traditional management." Which is like, I think everybody agrees and in any good organization, people try to go away from, "I'll tell you what to do." It's more, "I'll support you in what you're trying to do." But it still kind of creates this hierarchy. You need this hierarchy because people cannot scale the relationships.

So, the idea was like, hey, we can create, in fact, the game theory to coordinate people instead and use kind of chain mechanisms to pay and do this. And I think that failed because people are messy and there's a lot of like people, things that needs to happen. This is where I have a whole thesis about actually AI being in the middle of this coordination actually solves a lot of these problems -

[0:16:12] KB: Because they can deal with messiness, in a way that traditional code can't.

[0:16:15] IP: And it can deal with the scale, right? So, one of the things, like as a person, imagine you have a thousand reports. I mean, you'll go crazy and you'll be really bad manager for them. But AI is handling 1,000 reports is no problem, right? It can give everyone personalized context. It can collect information from everyone. It can broadcast it in personalized way, et cetera. So, it actually scales with the organization. 

To me, this is like main for use case, kind of core primitives that then everything else on top, like, "Hey, we want to bring whatever real-world assets here is because of the marketplace." We want to issue equity as a token because of marketplace. We want to figure out how to build new type of organizations because of this coordination mechanism. We want to coordinate payments that are - so like all of these pieces reinforce each other, but they are the use cases. Then everything else, like, for example, privacy and other things are, they kind of leverage some of this, right? If you want to have, so for example, we use this approach called trusted execution environment. So, this is a specialized hardware element that are available on Intel CPUs, AMD, as well as on NVIDIA GPUs, and some of our accelerators.

The idea there is you can use it like, Azure provides you this service as well. But you need to trust Azure. Azure tells you like, "Hey, we're running it in secure hardware."

[0:17:35] KB: There's so many things right now where we're just like, Microsoft, Google, Amazon, we can probably trust them, right?

[0:17:39] IP: Yes. So, versus if you have this global registry, now the device can register directly and say, "Hey, here is my certificates from Intel and NVIDIA," and you can verify them on chain, and now there's an IP address registered. So, when you go to them, you have all of this cryptographic routing and supply chain to verify directly without needing to trust extra cloud providers. You can now build a full cloud, which is just from directly providers who self-registered, who can come in online, which means you can also find a closer, for example, data center and provider for your AI inference to reduce latency. You can distribute the compute more evenly and not have all 100,000 GPUs all sitting in Memphis and using all electricity. You can actually have privacy because the data is fully inside Secure Enclave and not visible even to the hardware operator and you know what model runs there. You don't need to be, "Oh, did I run 4.0, 3.0? Did they change it yesterday?" I have no idea, right? Like you can actually have guarantees around that. So, it actually gives a lot of these guarantees because we have this blockchain layer for identity coordination and payments, right? Because you need to pay these people to use their hardware.

[0:18:51] KB: So, I want to dig into that and from a few different angles, but since this is software engineering daily, let's start from the software side. So, if I'm a developer wanting to tap into that, what does it end up actually looking like for me?

[0:19:05] IP: Yes, so I mean, depends on where you are in a stack of what you're trying to do as a developer, right? So, the simplest way we have, for example, just an open AI endpoint for GPU inference that runs inside Secure Enclave. So, everything you send there, it's TLS encrypted on your side, it's decrypted inside Secure Enclave. Nobody in the middle can actually access it. It runs on the model that you asked and you get back and you have a certificate, again, that you can check and verify that NVIDIA and Intel signed effectively on that.

Now, if you want to build an agent, for example, that runs on behalf of a user, and even you as developer don't have access to what users is asking for, which is super useful as you go financial use cases, medical use cases, also just daily life. Imagine you have this Fireflies or this recording of meetings, bots right now, their servers are getting all of your calls and all your data, which is like, now I need to think about, are they going to get hacked? What did I say? Or if they were using our stack, they could have put the whole system into the Secure Enclave where, effectively now all the information is streamed directly into the server that's encrypted end to end, run there, and then only you get back the result. And then developer just uploads their code, right? So, you effectively package Docker container and upload it, as we call it an agent, into the system. It uses private inference, but the agent itself, your general code runs in the Secure Enclave mode as well.

So, we have an agent hub where you can see a bunch of, like we have about 1,000 agents who are running or can optionally run in this mode. Now, if you're even lower level developer or you yourself want to build something that includes payments and other systems, that's where we have this idea of agentic protocols where you can effectively create a smart contract, so a contract with Rust or JavaScript that runs on blockchain that itself can call into these agents and get back the result and kind of visualization.

So, the examples we have now are mostly about trading kind of in financial use cases. That's the first thing people do. But again, let's say somebody wants to build a name service or something else, you can also have this kind of things where, again, the logic happens, like maybe your pricing model or your loan evaluation scoring happens in this verifiable way. Then the execution of actions happens through the blockchain. It really depends on kind of on the level of the stack you want to build your applications in.

[0:21:42] KB: A few different questions about that. So, thinking about this model of I'm a developer, I want to build a secure agent or something like this. I just upload my Docker container. Now, for me, as someone who ships a lot of applications, I immediately start saying, "Okay, what about observability? How do I know if they run into a bug? How do I debug this thing? What does that end up looking like in this stack?"

[0:22:03] IP: Yes, so this is where things get interesting because now you have a trade-off between privacy and observability on the different sides of the spectrum. We are actually working on analytics and debugging system that sits underneath, as you ship your Docker, to give you some of the observability where you effectively specify privacy versus observability threshold, which you will inform the user as well where you want to sit. So, obviously you can have full observability, but then you have access to everything that users put, or you have none, or you can have somewhere in the middle where it actually summarizes stuff for you and maybe gives you the logs of failures and bugs, et cetera, but doesn't give you the exact queries that users sent, right?

We actually have exact kind of a sprint on building out the tooling, including like quality control, latency, times, like all of the stats that you actually need as a developer to understand how your agent is working.

[0:22:59] KB: That makes sense. Maybe also, can we go in a little bit on the trusted execution environment? And in particular, I'm thinking about things like, okay, I can know that if I'm a user or I'm a developer sending something off to a service, I can know my data is encrypted, I can get back stuff that it was encrypted. How do I know that your software isn't just posting that data somewhere else? Is the trusted environment locking down the network, or how does that all work?

[0:23:23] IP: Yes, so there are a few things that happen. So first of all, when you establish a session, you're effectively getting back the hash of a Docker container that runs there, which is authorized by the hardware. So, the signature you get is, effectively says this Docker container runs on this CPU and this GPU, and you can verify that. If developer publishes this Docker container, you can make sure what it is. Now, not everyone wants to open source everything they do, and so this is where, A, indeed, the plan is to have a firewall system where you can indeed lock the access, because you may want it to go and access some APIs and some MCP servers or whatever.

The other piece is we're actually working with the external team on agent security, so where you actually have an agent itself who runs and sight - who inspects the code of the Docker of this agent that use developer uploading. So, it effectively gives you a security report based on like, "Hey, it looks like it's sending all the requests it received to some external AP address. Or maybe it like parses all the API keys and leaks them."

Effectively, we can have scanners that are themselves are AI-based, that there's no person who looked at external developer code, but there's AI that looked at it and certified it in some way. Now, that is cat and mouse, to be clear, but like with this combination of these methods, you can get some reasonable level. And then the longer-term research, we actually invest in formal verification. So, this is a bit more, again, fundamental as I mentioned, I think there will be a threshold at which the models will start hacking into other systems.

The thing is, both people write code with vulnerabilities, and AI now trained on the code with vulnerabilities, writes code with vulnerabilities. There's this image obviously with thin slice, right? Everything is on top. We're kind of layering in more now as AI at a faster speed. The fundamental way to solve that is if we have a mathematical proof that the code that runs is exactly satisfies your criteria. So, usually, right now, when formal ratification is used because it's so expensive, like it's manual work, you only do it once for some set of criteria. The problem is a set of criteria itself can be wrong, right? 

What you want actually is when you're calling the service, you want to provide, you as a developer effectively calling into it when it provides a set of things that you want to be guaranteed. For example, that none of this data is leaving this enclave and only this URLs are getting accessed in this way. Then the service actually responds back with a verification, right? Like certificates around the Secure Enclave and verification that indeed these criteria satisfied. So, this is actually what we're working on is really to build this trust level at a mathematical kind of guarantees. It's also very useful for blockchain where, people getting money stolen all the time, where this is like a very fundamental piece where like, if I'm putting in money, I want to guarantee that money will not - I'll be able to withdraw at least as much money as they put in.

It's a very short-term applicable to blockchain, but long-term we want it applicable to every service in the world, because this is actually how we're going to stop this sprawl of vulnerabilities in all systems. 

[0:26:53] KB: Yes, that's fascinating. Do you think that's going to limit the set of programming environments that is able to work in this space?

[0:27:03] IP: I mean, we're going to way kind of collapse of program environment as coding models get better. Because the thing is like, I mean, AI really doesn't care, it can write in any language, right? So, it's better to write a language that's more written because more trained data, even now that part is getting solved because there's some companies where they just generate a lot more training data in the programming language of the target and so you can train in that. So, I think it will be really more important to have this kind of strong guarantees of security than having 50 different programming languages people can write in. I use this like before we would write code once and read it many times, and so you wanted to make it, now we write code once and read it never.

[0:27:49] KB: This is interesting, because it kind of taps into a few different pieces. One is with LLMs or anything that's sort of kind of probabilistically generated, the ability to validate rises in importance tremendously. And in fact, one of the reasons I think that coding is such a useful environment or something that's so amenable to these models is because we already have to think about validation, right? We've been thinking about how do you do type checks? How do you do unit tests? How do you do all of these different things for a long time? What do you think are the attributes that need to be there for programming language to be a good LLM target, right?

For example, I've seen the LLMs do a much better job at generating strongly typed code, particularly because agents are able to use that as a part of their feedback loop. Whereas if you use a dynamic programming language, even one with a lot of training data in the corpus, JavaScript, like, it's not as joyful of an experience working with LLM code. Let's say that.

[0:28:43] IP: I mean, it's very practically speaking, right? It's like, a lot of the types, especially in languages like Rust, they become very semantic, right? At least, when I build, I try to make semantic typing, even if it's the same underlying thing. But for example, for your smart contracts we have an account and balanced as the type, even though it's like U128 underneath and a string. But those semantic types allow to effectively, when you look at the functions specification, you can like, "Hey, this is amount in and amount out. This is from two accounts." So, it gives you a lot more kind of context as a human.

I mean, AI is not that different. AI has, I would say, at this point, lower ability to kind of disambiguate and like map some of the complex structures, right? I mean, this is also just practically speaking, the models have a limited amount of like reasoning steps they do, right? I mean, you can run them for longer. This is where all this like old-style models and our style models come where they literally run. Okay, we need more reasoning. Let's just like push more tokens through the inference. But obviously, it has its own limitations. So, like when you need to map like, okay, there's an argument coming in, I need to look at everywhere else where this function was called to and what semantic meaning this argument has, like it's obviously way harder. Then like memorize that when next time I need to call a function, to really disambiguate this.

So yes, I think strongly typed and then adding this for modification method because this actually adds just additional semantic properties, right? For example, for sorting, it will literally, I mean, what we're designing will give you like, "Hey, actually the return will be such as that every element is larger or equal than the previous element." Now, you have like semantic meaning of the whole function without needing to read the implementation and maintain that constantly. So, it gives you like a lot more properties. I think that is going to be the more useful environment for AI-generated code because then, indeed, we don't need to go and read and validate it because, again, we have an engineering team who are using AI now on a daily basis, and you cannot catch up anymore. If you have five engineers who are pushing 10,000 lines every day of AI-generated code, we're actually starting to think how to manage the team, how to structure the organization to the code differently than you would do before, because before you would usually want to have multiple people who know how the code works to really and review each other's like pull requests, et cetera. Now, I actually think it might be not like, it's actually would slow down things and maybe not very useful. Instead, just give everyone their own subsystem to own and they just need to doc like the - there needs to be documentation that describes what the system does, which ideally should be enough to regenerate the whole system if it's through LLM. And then there should be just a bunch of tests.

[0:31:49] KB: This is really interesting and relevant because everybody's trying to figure this out, right? Okay, these tools dramatically accelerate our ability to write code. What does that mean for what we do and how we do it? And what you're describing is actually very similar to what my team ends up doing where we call document to development, right?

[0:32:06] IP: Yes.

[0:32:07] KB: The core thing you're engineering is this specification or document that can be used to generate the code. The code itself is like, it's like a binary.

[0:32:16] IP: Yes, and then the other interesting thought that, I mean, we've a little bit experimented but haven't fully implemented yet was if you depend on somebody else's system, you actually write tests for their system. So, usually, you expect them to write tests and then you just use it, but because they may regenerate all the code tomorrow completely, you want to declare your dependencies through tests. 

[0:32:37] KB: Oh, that's fascinating. So, you essentially are writing, like, here are the guarantees that I'm depending on from your system, so that if you regenerate it, it makes sure those continue to be valid.

[0:32:48] IP: Correct, yes. Then each system can be, like, literally owned by one person, and if , that person moves on to another whatever and somebody needs to come in, they need to read documentation, and they can even regenerate the whole thing if needed, and other subsystems will tell if something is off.

[0:33:04] KB: Another piece of this that I'm curious if you have thoughts on is, how do you indicate to the LLM what sets of contexts to pull in for any particular subsystem that it might be editing? Is it just that one document where there are links in different ways? How do you think about that?

[0:33:18] IP: I mean, ideally that document has as much context about that subsystem as possible, but you may need broader context somewhere. I think the Cursor has its rules, which are kind of a useful concept. I think some links and some kind of maybe, again, hierarchy of dependencies is useful as well. But yes, I haven't seen that fully worked out yet. But this is definitely interesting as well. What is the knowledge graph of the systems as well, right? Especially when we're talking about really big code bases, like hundreds of thousands of lines of code, that becomes the mapping out the concepts, right? Like LLM needs to do that somehow. So, you kind of need to feed it enough of information to do that without also overwhelming its context. Even a million tokens is cool, but if we're talking about a hundred thousand lines of code, that's way more than a million tokens usually.

[0:34:08] KB: Coming back a little bit to this privacy-first AI that you're talking about, a thing I'd love to get your sense on is kind of around how to bootstrap this, right? Because looking at the industry right now, one, we have models themselves are extremely expensive to train. And two, we're in what feels like a worldwide GPU shortage, where there's literally not like - I was talking to a couple of different folks at AI companies and they're like, "Yes, we just get throttled by the providers because they are out of GPUs. There is not enough GPU for all the inferences that are happening."

So, in the big corp world, they are all putting massive amounts of capital down to try to build out new data centers and all of these things. If we're looking at a privacy-distributed type of system, how do you actually get that built?

[0:35:01] IP: I'll start with the second part because it actually, it's a solution to this problem. So, right now, you say, "Hey, I'm going to let's say Anthropic Cursor and it starts to stroll me." The reason why this is happening is not because in the world there's no GPUs available right now. It's because Anthropic doesn't have access to GPUs available, and they don't want to get their model to be run on some GPUs, they don't know who runs, right? They trust Azure, they trust Amazon, maybe they trust some other provider. They don't trust like me having a box of like eight GPUs to upload their whatever, full point of model. It is a real challenge, like the model providers, because that is main IP, like it's a very valuable IP. If they give it to somebody else to even - there is actual providers like fireworks and together and others, they're serving open source models, they could serve other models as well. But the model providers don't trust them.

So, what we actually were solving that problem, because we actually say, "Hey, we have this Secure Enclave where if you upload the model, neither the hardware provider, not a user can have access to it. It's effectively in sealed container, but now you can deploy it anywhere." There's a data center in Philippines that's underutilized. Cool, let's ship a model there and serve it from there. There's somebody has hardware in Tokyo and there's a bunch of requests coming from there. Cool, let's make it there. 

It's actually solving this exact problem of, right now, you kind of need to - like everybody's building big data centers for themselves. But then there's also a lot of smaller, like 10,000 H100s and H200s data centers built everywhere right now, which are actually underutilized. If you go to, there's GPU list and there's like SF compute, and a few others, they actually have a lot of inventory, which is underutilized because nobody wants to go and buy 4,000 H200s or whatever for, you for a year, unless you're a big company, you don't need that much. I just need to run whatever that model that just published yesterday on 10 GPUs. That right now is like a highly inefficient market.

You remember, we talked about blockchain being a really good for markets. Well, this is where the solution comes in. Privacy is a very important component because of this like kind of IP needing to be moved around in an encrypted way. So, this is part of our decentralized consideration machine learning cloud where you can actually encrypt your model. It's addressed in encrypted format, and then when somebody needs it, it gets decrypted inside Secure Enclave and gets used there, and you can run it across any place in this decentralized compute network, and you get automatic rebalancing and validity from that.

Now, how do you bootstrap this is an interesting question. Now, this is also where blockchain has an approach and the approach is effectively subsidizing initially compute while you're growing the network, right? So, this is how Bitcoin grew, right? It was effectively subsidizing compute before it had any value. People were like willing to bet that it will be valuable and started mining it. Then, as value grew, it caught up. There is an opportunity here to have a very similar model where we effectively subsidizing people coming with compute while we're growing the demand and then again open it up for more model providers to actually serve their model and imagine now Anthropic, is like, hey, rate limiting or you can use this decentralized compute which is verifiable. We verify that the pass is correct. Cool, we're going to upload their model and now everybody can use, including actually, if you have your own GPUs, you can turn them on into this mode, join the network, or you can just run it on your own workloads.

So, you have them sitting on your desk or in your data center, now you can use it for your own workloads, but you're still paying on Anthropic for using it. So, that's an important part. It's not like open-source, but it's actually you paying back the developer for using it, but you cannot get like actual physical access to the model weights.

[0:39:15] KB: Yes, that's fascinating. So, in some ways, if I were to sort of replay your argument here, each hosting provider is building for peak usage. It's inefficient assigning. Essentially, people are saying, "Who do I trust?" Well, if I'm Anthropic, maybe I only trust the big three. That's the only people I'm going to use to host my model. And you're saying, "Okay, well, there's all of this spare capacity out in the world where the gap is trust and coordination, human coordination, building those contracts or what have you." So, if you can automate that layer, suddenly you have a much larger pool that can scale up and down.

[0:39:51] IP: Yes. It solves, I mean, latency and even electricity problem, right? Because you kind of distribute it in the workload. Right now, it's effectively like Amazon needs to build a big cluster with a gigawatt electricity station on it, or you say, "Hey, we actually have a lot of smaller data centers with like smaller power consumption around the world." So, we can just distribute across them.

[0:40:11] KB: That's fascinating.

[0:40:12] IP: For context, NVIDIA has had run this program where they effectively gave allocation of GPUs to the smaller data centers around the world. I mean, their strategy has been like trying to counterweight some of the hyperscalers who have like a lot of other GPUs to have like a big, small distribute, like smaller 10, 20K clusters around the world. But those are underutilized because if you're sitting in Silicon Valley, you would go to Amazon, you wouldn't like go and hunt for data center in Japan or somewhere in Norway.

[0:40:44] KB: So, that in some ways, solves the GPU coordination issue, but it doesn't necessarily solve some of the things you brought up before around like sleeper agents and unknown biases if we're distributing Anthropic models and OpenAI models and things like that. So, what about the model bootstrapping process?

[0:41:03] IP: Yes. That is harder. It's step by step. First, we need infrastructure where we distribute some models, including potentially, obviously, the easiest ones are open source that already exists like DeepSeeks, Qwens, Llamas, et cetera. But indeed, even though we call them open source, they're actually not open source. They open parameter models. We have no idea what went into them.

So, how do we actually do a truly open-source model? Well, we need to train it in this way, where we know what inputs went in, but if you also release the weights, then you're not going to make any money. So, you actually can train the model inside the Secure Enclave where the outcome is not known to anyone. The outcome is always encrypted and only usable inside the Secure Enclave. Now, you have a model that's not owned by anyone. It's not owned by any single company. You can have effectively, token holders, community to come together, say, "Hey, we're going to train this model. Here's a dataset. Here's a model training process. Let's collect, let's say, amount of dollars required to do this, we're going to launch it, it's going to train, and now this model going to be used inside this network as well, and the revenue going to be coming back to people who put in the work and the money to train it." And the token is affecting now a method to distribute this value back and forth, right? 

So, with token, you can now fundraise, you can go and say, "Hey, we're going to be training an open source model that going to be encrypted weights, not open weights, encrypted weights, that is actually going to generate revenue. Now, you can invest and get return and maybe reinvest in the next model or cash out. Now, that's still like I skipped some hard parts, which is getting the right training data, getting the training process. But this is kind of the scaffolding of how to do this is actually create a community own models, where the community decides what goes in, training data, et cetera. They can inspect, they can decide, and then the training process happens inside this.

Now, I'll caveat it that the reason why this hasn't happened yet is because the computing tech is only catching up. So, this whole thing has only been really possible for about a year, and so the broader community, we have this really great partner, Fal, who's been building a lot of this infrastructure for confidential compute. Only the black walls actually support a cluster-level confidentiality. So, there's not yet available. This is kind of like we're growing with the compute and hardware actually availability of that. But the idea is to have this kind of system ready as soon as hardware is available. Right now, you can own - right now, you do inference and fine tuning on the H100s, H200s in this way, because you don't get the cluster level, you only get the machine level confidentiality.

[0:43:51] KB: It's a really interesting model because it essentially inverts what open weight models do today, right? Instead of saying, "Hey, we have a set of training data, which we might tell you about, but you don't know exactly," and we have a training process as well, like, "Here's the software that's going to run, here's how we're doing reinforcement learning at the end or tuning, or all these different things." And then we publish the weights. You're saying, "Okay, let's take all that initial stuff, make that open, make that visible, make that public. But the outcome, we're going to hold on to that in an encrypted way so that we can actually, recoup some of the investment."

[0:44:24] IP: Correct.

[0:44:25] KB: Interesting. So, you mentioned the hardware is just getting there, all of these different pieces. Can you project out like, what does the timeline in your head look like for how this is going to play out?

[0:44:39] IP: I mean, we started talking about this about, I would say, eight months ago, right? So, in the past eight months, the hardware started to catch up. We have built out the initial things, so we can run this inference now. There's some first versions of fine-tuning of the confidential way as well. So, fine tuning is the first version where you can take an open-source mode, like DeepSeek, and then fine tune it on a private data or in public data. And then the weights are encrypted now, but it's still monetizable. So, that's kind of the first version of in the step-by-step process.

I think the proliferation of Blackwells will be required for this really to turn to the next step, which, given Jensen's projections, should be happening anytime now. So, I think like within next year, we're actually going to start seeing this really working. And then, year and a half, two years is when I think my hypothesis is the open source and kind of this like ability to coordinate a lot more people contributing data, contributing research expertise is able to actually outrun decentralized labs. They've done well, they've done in the right way. For example, we may need like AI researcher agent that sits, that is able to like able to get everybody's ideas, score them, maybe run some evaluations, et cetera, because you need to allocate compute on some of those things.

So, I think the goal is probably within two years to get to the speed of innovation that happens in this
user own way is faster than what's happening in closed source labs. But again, confidentiality you can use now. So, there's benefits from this now that people can already benefit. Again, I think any use case that touches medical, financial, and those highly sensitive government areas, is definitely can leverage this now. Then there's also just a lot of enterprises who are uncomfortable with giving all of their data to a company, to another company. So, this is useful for them pretty much immediately.

[0:46:38] KB: Yes. As you highlight, if for example, Anthropic or OpenAI or someone like that wanted to be able to give guarantees and say, "Hey, we can't see your data, we literally cannot see it," they could also start running things in this way.

[0:46:53] IP: True, exactly. Yes, and maybe for them, it doesn't make sense immediately, but for next level of companies that don't have as big reputation, this actually really makes sense to do right now. 

[0:47:05] KB: Yes, Awesome. Well, we're getting close to the end of our time. Is there anything we haven't talked about here that you think would be important to discuss before we wrap up?

[0:47:13] IP: I mean, I think given this principle, I encourage people to really think through like how people can contribute, right? Because at the end, it's going to be an open source like community initiative. There's financial incentive and model to reward people because I think one of the challenges of open source has historically been -

[0:47:29] KB: How do you support it?

[0:47:30] IP: Yes, unless you work for Google or Microsoft, which kind of pays the salary, right? It's a very thankless job. But I think the opportunity here is actually kind of create something that indeed can move quicker and has the wisdom of the crowd coming together, as well as, can use some of the private data that maybe you don't want to actually touch. But one of the ideas was, again, because you have a verifiable compute, you can have a pipeline where, let's say, you take people's private data, but then you have a very specific cleaning process that everybody looked at, audited and agreed, like remove social security numbers, phone numbers, addresses, names, et cetera, which is not useful for training these models anyway.

Everybody knows that they can contribute data and receive some reward, and it will be cleaned in the right and expected way. Then that data is never seen by anyone anyway, and fed into model at like this pre-training steps, right? So, you can have like these new ways of actually even gathering more training data or for research where let's say, right now, medical information is not able to be used, but you can run inference on it in a private way. So, there's just like so many new use cases and opportunities. I just encourage everyone to kind of think through where they can really leverage that and reach out and connect with us and the team to leverage what already is available now and then contribute to building this forward.

[0:48:56] KB: Okay. That seems like a good wrap.

[END]