EPISODE 1837

[INTRO]

[0:00:00] ANNOUNCER: Python is the dominant language for AI and data science applications, but it lacks the performance and low-level control needed to fully leverage GPU hardware. As a result, developers often rely on NVIDIA's CUDA framework, which adds complexity and fragments the development stack.

Mojo is a new programming language designed to combine the simplicity of Python with the performance of C and the safety of Rust. It also aims to provide a vendor-independent approach to GPU programming. Mojo is being developed by Chris Lattner, a renowned systems engineer known for his seminal contributions to computer science, including LLVM, the Clang compiler, and the Swift programming language.

Chris is the CEO and co-founder of Modular AI, the company behind Mojo. In this episode, he joins the show to discuss his engineering journey and his current work on AI infrastructure and the Mojo language.

Kevin Ball, or KBall, is the Vice President of Engineering at Mento and an independent coach for engineers and engineering leaders. He co-founded and served as CTO for two companies, founded the San Diego JavaScript Meetup, and organizes the AI in Action discussion group through Latent Space. Check out the show notes to follow KBall on Twitter or LinkedIn, Or visit his website, kball.llc.

[EPISODE]

[0:01:30] KB: Chris, welcome to the show.

[0:01:32] CL: Thank you for having me. Excited to be here.

[0:01:34] KB: I must say, I'm so excited to hang out with you because you have such a history in an area that is fascinating to me, which is language design, but in particular, language design for the current age that we're in around AI and machine learning. So, excited to dig in. But let's first have you introduce yourself a little bit to our audience, who you are, your background and what brought you here?

[0:01:55] CL: Sure. Well, so I have, I guess, two epochs to my professional career. Epoch one known for CPU programming stuff and developer tools. I'm known for building the LLVM compiler and that underlies a lot of programming languages including C++ and Rust and Swift and tons of different things which is really, really cool. Through that journey, I worked at Apple and built a bunch of cool technologies including the Swift programming language which was a nights and weekends project that kind of scaled and went beyond. That was very exciting and did a whole bunch of stuff in GPU programming and other stuff like that at Apple.

Around 2016, I got really interested in AI, and so that was before LLMs, that was before ChatGPT, that was back in kind of a different era, but I saw the potential of the technology. And I've been on kind of this hero's journey ever since then, trying to figure this stuff out, make it easy to use, make it so that everybody can actually have access to this, and it's not just kind of locked up in big companies. So, that brings me to Modular, where we're building Mojo and Max and a whole bunch of cool technologies, where we're trying to help fix AI compute.

[0:02:56] KB: Yes, let's maybe do a quick high-level of what Mojo and Max are before we dive into the guts. What were the motivations behind building this?

[0:03:04] CL: Yes, so before we say what they are to your point, let me tell you why we were forced to build them. So, if you zoom out, AI, it's not new, right? I mean, depending on how you count it, it's decades old, but the modern deep learning revolution is within the last 10 years. It's been amazing how that exploded onto the scene and all the different fits and starts and new things got added along the way. Well, along the way, it got really built into this technology from NVIDIA called CUDA.

Now, CUDA is the thing that powers so much of deep learning. It is what enables NVIDIA to be the titan of the industry right now that is reigning supreme across the entire world. We, I think, rightly owe a debt of gratitude to NVIDIA to building CUDA, like it was designed for a completely different world, a completely different use case, to enable physics and games and all kinds of different things that really had nothing to do with the AI. But it was the right technology to catalyze AlexNet and a whole bunch of really early deep learning technologies that took off.

Building on top of that, we got things like TensorFlow, and then we got PyTorch, and we got other technologies built on top of CUDA. And when you play it forward, AI was changing so much, like so many new research papers and so much innovation, so much stuff happening all the time that we as a software industry just put more and more and more and more stuff on top of this CUDA and PyTorch and other technologies and just kept accumulating, kept building higher.

Well, today, a lot of that foundation is not really actually stable. It's very rickety. A lot of things don't work. Nvidia is an amazing company, but they kind of get too much of the lion's share of credit for what they're doing right now, and a lot of people are unhappy with that. So, what we set off to do at Modular back in the day is we said, "Okay, well, let's actually fix this. But fixing it can't be done by adding another layer of duct tape and bailing wire on top of the current stack." Let's really fundamentally challenge the status quo. Let's do something much more difficult. Let's go build a replacement for CUDA. Let's go build a full-stack ecosystem where we can actually build into AI. Let's design it for Gen AI, not with Gen AI as an afterthought. Let's make it portable. Let's make it scale. And let's really tackle a lot of these assumptions.

Now, given my background, I understand fully well what it takes to make a language successful. This is not a three-month sprint. This means doing more than just building out a parser. This means a whole tools ecosystem and community and many other things. So, the only reason we said yes to this and decided to embark on this is because we think the stakes are high. We think that there's a whole new form of compute. People are struggling today, but as we look ahead into the future, I only see more hardware, more innovation, more AI, yes, but also more application for this kind of compute. This is where I don't see anybody else out there that's really swinging as hard as we are and trying to actually make a difference here. 

[0:05:52] KB: Yes. Let's maybe take a look at what that starts to look like. When I was trying to bring myself up to speed really quickly on what Mojo is, I looked at this and I was like, "Oh, this feels like a Python, Rust mashup." Is that a fair description? How would you describe the language Mojo?

[0:06:07] CL: Well, if you look at Mojo, you can see many different things depending on who you are and what perspective you're coming from. First, superficially, the syntax looks like Python. So, there's a lot of Python DNA that is very intentional. The entire AI world is really revolving around Python. Python is one of the most well-understood and well-loved languages out there. And so, I'm very familiar with building languages and we can argue about curly braces and different syntax things and stuff like this. But from my perspective, Python won and it won for a lot of really good reasons. It's beautiful. It has a lot of advantages to it. I understand taste may vary, but we decided, let's make surface-level syntax be Python family, right?

Now, your point is also really good, that Rust, huge influence. So, Mojo has a whole borrower checker. It's not the same as Rust, but learns a lot and then takes the next step forward. It has other type system features. It learns a lot from Swift. It learns a lot from many, many, many other communities. C++ even has still some good ideas. I mean, one could argue that C++ has too many ideas. And so, Mojo is very different in that respect. But really, I look at this as saying that our goal here is to bring the best ideas together in a novel way and then try to solve the problem in a way that only we can do with this combination of techniques.

So, the problem we want to solve here is basically GPUs. How do you program GPU? How do you make stuff go fast? How do you enable code to span GPUs and CPUs? How do you enable these crazy AI ASICs that people are talking about? And these problems are things that other programming languages aren't really trying to do in a coherent way. This is a big thing. So, you'll see many influences because I think that taking good ideas from wherever they are is really important. But many of the details end up being quite a bit different.

[0:07:54] KB: I like that framing as a raison d'être for the language of, like, this is about solving GPUs and making it seamless to bridge that CPU-GPU boundary. Can we maybe dive into some of the language features that you've chosen and look at them from that lens? Like, for example, the ownership borrower checker, right? That is very cool, very big part of Rust, also relatively novel, very different for someone coming from a Python where you've got a garbage collector and things like that. What's the driver for that? And how does that frame, or how does that really help you solve the GPU problem?

[0:08:28] CL: Yes. Well, so actually, it's funny, you asked two questions. You asked, what is an example killer feature? And then you asked about the ownership model, and they're actually totally unrelated. So, you need an ownership model to have a memory-safe systems programming language. So, like Rust, we have no garbage collector. We use static memory checking and memory safety is really important for any modern language. And so yes, we have to have an ownership model and we could talk about why Mojo's is better than Rust or things like that. But yes, we need to have that. But that's not the differentiating feature, thus tackle hardware. So, that's table stakes these days.

So, the more interesting things are when you look at things Rust doesn't do, right? Rust does not have a very powerful comp time model, and so Mojo has comp time which is I think most known from Zig, for example, which allows you to write arbitrary code that runs both at runtime and have the same code run at compile time and so you can think of this as like C++ templates or there's many macros there's many different systems and other languages that try to solve like how do I get code to run when the program is being built. Zig, most notably, but also Mojo, take that way farther forward than these languages that use templates or other languages by unifying it with the host language.

Now, why is that important? Well, it turns out that when you're programming a GPU, GPUs are very complicated. Okay, fine. But they're also really weird. So, GPUs are all about performance. So, people care about cost, and they care about latency and they care about these things when using AI, particularly with Gen AI and deep learning. But then it turns out that the memory hierarchy is really fragile. If you get it exactly right, you can get super high performance. If you get something slightly wrong, well, you fall off a cliff with performance. Techniques like auto-tuning become very important. And reconfigurability when you start merging AI operators together is really important. And being able to span across multiple different kinds of hardware is really important. And so, what you want is you want to be able to write code that's generic but not just generic over float versus double, you want to be generic over how many threads are in a warp and many, many, many other parameters end up mattering a lot to efficiency on a device. But they don't affect the structure of the computation and so this is something that Mojo is really, really excellent at.

I can give you another example. We could talk compiler tech, but why don't you ask questions about this first?

[0:10:51] KB: I kind of want to dig into that. So, just to make sure that I'm understanding, one of the big things that you're diving into is at following that linear, starting from C++ templates, this is comp time and things like that is essentially how can we pre-compile, build a lot of our genericism, our parameterization into the pre-compile so that it's happening at compile time based on the architecture that you're targeting, but and have to be dynamic at runtime and incur a cost at that point. Is that right? 

[0:11:18] CL: Yes. So, let me give you an analogy just to make it more accessible because I assume not everybody's already a good programming GPUs. So, I'm a compiler guy, right? Compilers, you often have parsers and one common technology is called a parser generator. You use like a domain-specific language. There's Lex and Yak and Antler and many of these different things. And what you do is you write the grammar for the thing you're trying to parse, and then you run a program that then gives you Rust code or C++ code or whatever it is that you're writing. So, that is run at compiler compile time. You're like building the code so that you can then link it into your application, and then you don't run it on the fly.

Now, GPUs have similar kinds of problems. So, in a GPU you have what's called a Tensor Core. A Tensor Core is the thing with all the floating-point operations. So, this is the thing that's really, really important for performance, but now makers of GPUs keep changing them. Even NVIDIA, if you just zoom into their pretty important ecosystem, the Tensor Core on the A100, the Tensor Core on the H100, the Tensor Core on the B100, these are, each are just three different generations of their hardware are actually quite different. So, what we want to be able to do and what the world needs is to be able to write software that's abstracted from that hardware so you can write things on tiles.

Okay, well, to do this, you need to do actually quite a lot of calculation about what index should be where and which element goes into which slot of the Tensor Core and exactly how do I lay this stuff out, and if I'm charging through memory, what order do I do this in, et cetera, et cetera. All that is actually very static by the time the code runs. So, it's complicated, like data structures. I mean, it's not just like a simple template where you can like add two integers together or something, you need to be able to actually have trees and talk about nested hierarchical structures.

The cool thing about Mojo is that you can just write code, and you can use normal code, you can use a list or an array, or a string, or like whatever data type that you want at comp time, at the compiler runtime. So now, Mojo has a very clean division between the code that runs runtime, the code that you can use at compile time, and when you're writing these algorithms, it's the same code, and that becomes very powerful.

[0:13:31] KB: This reminds me a lot of early in my career, I worked on a high-performance compiler and was dealing with navigating with folks in the scientific computing community. One of the reasons that, for example, people there were writing code in Fortran rather than C was because the memory model of Fortran allowed the compiler to lay things out in a way that it could manipulate how things were laid out in code much more powerfully than it could in C, because you didn't have to have the same like pointer access to wherever. So, you could optimize to, in that case the CPU memory hierarchy and make sure you were getting all of your cache hits.

Here, it sounds like what you're doing is that same level, you have some form of constraint. And I do want to dig into like what constraints did you have to place on the memory model or to be able to do this. But you have a form of constraint that allows people who write Mojo code to not worry about it, but have it then compile into things that are optimally laid out for the memory of these different GPU kernels.

[0:14:27] CL: Yes, that's exactly right. So, is the question, how does it work?

[0:14:29] KB: Well, yes, so the question is then what trade-offs, if any, did you need to make in terms of the programming language, like what a developer can do with this thing in order to get yourself the set of guarantees that you need to be able to do that type of optimization?

[0:14:43] CL: As far as I know, there's no trade-off. There's no downside. It strictly makes the language more simple, more consistent, more powerful. There's no trade-off. It makes the compiler more different than previous generation languages, right? But if you look at, again, I'm fond of Mojo. Of course, I'll say nice things about other people's systems. If you go look at Zig, Zig's a relatively very simple language, but has very powerful metaprogramming and generics capabilities. So, they made other decisions in other parts of the language, so they decided not to have a bar checker because that was part of what they were going for, but powerful metaprogramming doesn't have to come with complexity.

So, a previous journey, I built the Swift programming language. Swift has many good things about it, has some challenging things about it. One of the bad things about Swift is it is created with many different language features that are non-composable and non-orthogonal. And so, it has many things that got added over time, and they don't quite fit together in the right way, and so you get more and more and more and more and more features.

The benefit of having a powerful metaprogramming system from the beginning is that now many of those things became features in Swift instead, become features in a library. With Mojo, what we've been able to do is make sure the language is much smaller, much more orthogonal, and much more consistent. And I think this is strictly better. I don't see any trade-off there. It doesn't come at a complexity cost, doesn't come at a compile time cost. By the way, our compiler is way faster than Rust or C++, particularly for highly parametric like C++ templates, for example, are really, really bad for compile time, and so we can express very powerful things. I don't see a trade-off.

[0:16:20] KB: That is really interesting. I like introducing this concept of composability and orthogonality, and I'm used to thinking about it in the concept of a non-language software architecture, right? How do I compose things? How does that play into language design? What was it that you needed to do in the language to achieve that?

[0:16:39] CL: Yes, well, so we're all learning. So, I've worked on some cool things in the past, but I still don't know everything. If I did, I would be very bored because I love to learn things. One of the things that through an epoch of my career, I always looked at Python and said, "What is that thing?" I'm a systems person. I want stuff to go fast. So, what use does Python have? And I never really took it seriously because it was just a scripting language for kids or something. When I dove into the AI world, I was forced to come to understand what this Python thing was really about. What I came to realize is that Python, despite its interpreter and its implementation details and things like this, it has a really beautiful software library ecosystem that is, I think, maybe unmatched anywhere else in terms of the number of systems it can compose and the power that it gives to library developers.

To me that really opened my eyes, and this flows directly into Mojo, by the way, because what I saw is the thing you want to do is you want to keep the language simple, make it so that people can learn it rapidly, ideally without having to retrain, which is why we just say like, "Okay, well, Python has this feature. Awesome. So, do we." You don't have to relearn everything to start Mojo. If you know Python, you already know almost all of Mojo.

But then really focus on giving library developers superpowers. We're just talking about Tensor Core programming. Well, how does that work? Well, Mojo, the compiler doesn't know anything about a Tensor Core. What we've enabled is we've enabled GPU super experts to go build abstractions in the library that use all this fancy compile-time metaprogramming stuff. So, they're super, super, super efficient. And then you get full access to the hardware, you don't get an overhead, and the language is simple. So, to me, that's like a beautiful thing. Like if you can keep the language simple, if you can give library developers superpowers, and then you can make the whole system easy to learn, right? That's really to me what I see as our North Star right now.

[0:18:34] KB: Yes, that makes sense. What do you think it is that makes the difference between - Python is an interesting example, right? Because it's been adopted by such a wide range of different people. It's been adopted in the data science community, the machine learning community, communities that were not really like software engineering communities, and you kind of see that. And then it's also used for very serious software engineering. So, what is it about the language that enables that sort of library superpowers such that it can span those vastly different ways of approaching software?

[0:19:06] CL: Yes. Oh, so I can say good things and I can say less good things about Python. So, the good things I'll say is that it's very easy to learn because it is that universal connector language. It's almost like the duct tape for software. It has been able to span across many disciplines and may disciplines become cross-disciplinary, particularly as technology evolves. So, AI has been really good for Python because it was there, and I think Python got a huge boost because of AI. I think that duct tape aspect, which I don't mean in a negative way, but being the universal superglue, maybe that's a different way to say it, I think is extremely powerful. Also, being easy to learn, being taught, there's many, many, many different things that Python has benefited from.

Now, the challenges is that people start using languages naturally for things that they're not great at. And so, Python without types becomes a challenge when you scale your application sometimes, right? This Python performance becomes a challenge when you need things to go fast. What has happened with Python on the flip side is that you get Python, Python, Python until suddenly you're like, "Oh, it's slow." Now, I have to use C++ or Rust or some other GoFast language, and now I get actually kind of the worst of both worlds, which is I have some of Python and some of Rust. All of Rust can be a beautiful thing. All of Python can be a beautiful thing. But Rust and Python are very different things.

So, what Mojo is designed for, Mojo, by the way, is a GoFast language today. That's really what its strength is. It's not a fast Python. It's like, it's faster than Rust, by the way. What we've really focused on is making it so that you can integrate Python and Mojo code directly and make it super easy to do that. Now, instead of saying I have to take my Python code and rewrite it in a completely different language and then have a team that has completely different skill, and I have tabs and colons on one side and curly braces on the other side saying like, okay, these are actually very similar. Now, your GoFast language and your Python language are actually very similar. We found that that's actually very, very nice.

And so, we're working on features right now. We're basically saying there's zero FFI. You can just use Mojo code in Python and you can already just use arbitrary Python code in Mojo. So, these features make it really, really nice to say, okay, "Well, if you're in a Python world, having a GoFast language that's convenient and easy to use and right there and, and it's nice is actually quite unusual."

[0:21:23] KB: Yes. The FFI is an interesting thing to dig into a little bit because I think to your point, one of the things Python did well is they said, okay, we have all these fast machine learning
libraries and other things that have been written in C++. It's really easy to build a set of Python bindings over those and then suddenly everybody in Python world can be using their notebooks and their interactive flow and still accessing the power of that.

I mean, one thing in Mojo, it sounds like you don't need that for performance. But another thing that's going on there is like, you have all this existing infrastructure that exists. What does the interop story for Mojo, if say someone has an existing PyTorch or TensorFlow type of package, they don't want to have to worry about rebuilding their C++ in Mojo. What does that look like?

[0:22:08] CL: Well, so Mojo, I think you framed it really well there, which is you have two really different goals. Like on the one hand, in Mojo, you could directly rewrite the entire world and you could have everything pure Mojo and that's very nice because you get the benefits of the type system, which includes traits and like all of the modern language features that you'd come to expect from a high-performance system language, powerful meta-programming, all these great things that come with a well-designed language.

But the flip side is that pragmatism, you don't want to rewrite the world, right? And so, Mojo also is very pragmatic, and so I think that's very, very, very important. Mojo can directly talk to Python, and so you can import an arbitrary Python package. Go to town, just manage, you get all the wonderful things from the data science ecosystem and many, many more. That just works. Now, the flip side of that is if you import a Python package, you get the Python interpreter, and you get the Python packaging ecosystem, and you get the full Python experience. So, it comes with that. You can also directly talk to C, C++, these kinds of things, and so you can directly important call malloc or file.io or whatever if you want to go do that. So, being able to talk directly to C code is super, super important for lots of very pragmatic reasons.

Now, what we've seen in our community is that there are lots of people who like to build lots of cool stuff and so we've got people to build UI libraries, and they wrap an existing UI toolkit that makes tons of sense. But then we see other people that say, "Hey, wow, Mojo has other fancy features around like SIMD programming and things like this." So, they're able to get better performance using Mojo than they are using NumPy by like 10x and things like this. Then suddenly it's very fun to say, "Hey, wow, if I write some for loops in Mojo, I get 10x better performance and calling NumPy, that's actually pretty fun."

It's not my job to tell people what is the right way to do things. I think that what I'd love to see is I'd love to see a vibrant Mojo ecosystem evolve, but I don't want it to be kind of like religiously driven. I want it to be very pragmatic and very focused on outcomes and this kind of a thing.

[0:24:04] KB: So, something that you said there leads me in another direction that I was wondering around this. You developed Mojo originally to try to solve the sort of GPU problem and making it easy to program ML stuff. But you've also kind of highlighted it's able to access whatever. It can wrap other different things, and it's quite fast. And it sounds like doesn't have some of the same compile time tradeoffs that, for example, people trying to build in Rust are encountering, which I think it would be -

[0:24:30] CL: Also, C++. But I think Rust is more extreme, perhaps.

[0:24:34] KB: Well, in modern language features without maybe needing that. So, I'm kind of curious, what niches are you seeing people using Mojo in beyond that sort of core ML space that you were originally looking at?

[0:24:48] CL: Yes. So, this is the fun thing about languages is that they can be used for anything. I mean, assuming they're designed to scale. We've seen people building AI frameworks. So, at Modular we're very focused on AI inference, but we've seen people take that and do training with it, things like this. Like I said, we've seen lots of games and other things like that and people playing with like graphics visualization type of things, which is again, you want GoFast language to do that. UI libraries, I think are pretty niche, and I think that people are playing with this, but I don't think it's quite as serious. And a lot of people working on data structures and algorithms and lower-level components that systems programmers like to perfect.

So, there's lots of different applications. We're really focused on make sure it's really great for GPUs. Because GPUs are the new frontier, right? It's the thing that's under service. It's the huge problem. You were asking about what makes Mojo interesting. The second part, besides powerful compile-time metaprogramming is some basic compiler nerve-ry. So, if you want, we can talk about that. I'll try to keep it high level, but we have a whole new compiler infrastructure that I've been building for many years now called MLIR. So, using a powerful new compiler backend is what enables us to have both really good compile times and things like this. The metaprogramming system really builds on this, but then also it allows us to talk to lots of hardware. That's something that's very different. And there aren't other languages that are widely used that actually do this. So, this is another reason we had to build Mojo.

[0:26:15] KB: Yes. So, I would like to nerd out on this because I love to geek on this stuff. And I think once again, you have such a great background for diving into this. So maybe first, let's start with very high level. I think a lot of stuff today is built on LLVM, which you obviously are very familiar with. Maybe give people who aren't aware the very high level of what does that structure look like so that we can give them the context of what this new structure where the MLIR brings to the table, or how does it shift that world?

[0:26:44] CL: Yes, so let me go back in time, 25 years ago, I was a university student at the University of Illinois. At the time, gosh, 25 years ago, Java was the cool thing. So, virtual machines were the hot new technology and they weren't new, but it was taking over the world and everybody assumed that Java would be the thing that unites all of compute. Now, Java is a beautiful system in the Java virtual machine, beautiful system for a lot of different reasons, but it struggled because it was really designed for just-in-time compilation. So, it was really about, okay, you load an app off the Internet, you get it, and then you execute the code. For larger-scale applications, that having to compile it all before you start running, it was actually a bottleneck. So, LLVM came out of the scene. And our initial idea, which lasted about three microseconds, by the way, was to make a better Java compiler thing that didn't have this just-in-time component that was as heavy-weight as what Java did.

Now, when we played it forward, the way I architect and built LLVM out was to make it a very generic and composable ecosystem. It's a Modular design, the way that you can put together the pieces within the compiler designed to be composable. What LLVM evolved into is it evolved into the universal connector for CPUs, basically. Its design, it has one, what's called an IR, the intermediate representation. If you look at it, it's well documented, you can go read the spec if you want. If you look at it, it's like a compiler person's take on C. So, you've got integers and pointers and floats and structs, and you've got SIMD vectors, and so you have some of the basics there, but it's really C. The way that LLVM and then the family of languages around it evolved is they said, "Okay, well, if you take something like C++, or you take Rust, or you take Swift, or you take one of these other higher-level languages, it's somebody else's problem to figure out how to map a class down to basically C." And in the case of C++, okay, figure that out. Like what Clang does, and I built the Clang C++ compiler along with a big team, and it says, okay, well, there's vtables and so we can lower a vtable into your code and LLVM can represent that and that was fine.

Now, more modern languages like Swift and Rust and things like this really benefit from higher-level optimizations. So, you really want to be able to do de-virtualization. You really want to do monomorphization of templates and things like this. You don't want to do that on a syntax tree. So, LLVM is still an amazing thing, it's widely used for lots of different reasons. But as languages evolved, there became a need for higher-level IRs and higher-level representations of the program. Swift has its thing called SIL, Rust has its thing. Everybody started building their own thing.

What this drove is this drove a couple of exciting opportunities because languages evolved and that was very powerful and yay, go technology, but also drove fragmentation. The cool about LLVM is that, it united so much energy. Like the chip providers could say, "Hey, if I just add hardware - if I had support from my chip to LLVM, then I get all software. I get Linux, I get a web browser, I get like all this stuff just by adding it back into LLVM. I get Swift, I get Rust, I get Julia, I get all these beautiful things that come just by adding LLVM support." That's cool. And what that did was that enabled a factorization of industry effort. So, people weren't re-implementing the same stuff 50 times over. They get high ROI, high impact for their work, and they got this massive software ecosystem.

Now, as languages evolved, you got this fragmentation. The Swift people have their thing, the Rust people have their thing, the Julia people have their thing, et cetera. Well, guess what? AI people didn't notice. So, what happened is when AI came on the scene, they started building all their own stuff, and so their own intermediate representations. You got TensorFlow graphs, and then TensorFlow had TensorFlow Lite, and then you had TorchScript, and you had Glow, and then you had XLA, and then you had this explosion of compilers, Onyx.

All these things got built, and as all these things got built, what do compiler people do? They go and build a compiler. Well, now what you're doing is you're fragmenting the talent ecosystem. And I love compiler engineers. I know many of them. They're lovely people, but there's not very many of them. So, if you take a scarce resource -

[0:30:56] KB: You might say you know most of them.

[0:30:58] CL: I probably know most of them, like over half, probably, the people who represent as compiler nerd. You take a scarce resource and you divide it and you get this really unfortunate problem where many of these technologies, they're made by really well-meaning good people and they're very talented, but there's like two people on every project. So, they're well-meaning. They're good. They're very talented. But if you only have two people, there's only so much you can get done. They're trying to tackle these new spaces like heterogeneous compute, GPUs, crazy custom accelerators, like all this stuff. It became very untenable.

So, MLIR was a project that I built, again, out of need. And when I was at Google, and at Google, my day job was I was responsible for getting the Google TPU to work, get the software ecosystem to scale, get it launched in cloud, get TensorFlow to talk to it, get it integrated into PyTorch, these very big audacious problems. But I was also in charge of all the CPU, GPU performance. Google also had many unannounced ASICs that were both data center and Edge and many other things. So, all of these things had different compilers that are all built for their one little niche.

What I realized, is I realized that the LLVM architecture didn't make any sense. You can't say, "Hey, Google TPU," which has huge matrix multipliers of primitive, it's not C. That is not the right abstraction. And what I learned from Swift and from many other systems since then was, okay, well, there isn't one right answer. What actually you need is you need the ability to build domain-specific compilers and build them very efficiently and we need to refactor the ecosystem. We need to get compiler engineers to talk to each other again. We need to get it so that we get compounding interest out of our investment. And so, what MLIR is fundamentally, is it's a way of writing domain-specific compilers with very high leverage. So, you don't have to reimplement all the basic compiler stuff like a constant folding pass. You can instead really focus on your domain and get great leverage out of that.

So, this is something that has kind of taken over our ecosystem. I mean, it's widely used. There's lots of AI things that use this, but also your last show that I heard, Quantum Computing, massively into MLIR, because it's a very important domain. I've used it for chip design, and so making it so you can actually synthesize Verilog and then synthesize hardware underneath that. That's very different. That's not C. These kinds of applications are really enabled by this.

[0:33:21] KB: Yes. I love that. So, just to flesh back out or replay this a little bit, LLVM, and particularly the LLVM IR did, was kind of create a choke point for when things looked mostly C-like and were mostly compiling to target CPUs. Everyone could, if they had any language, they could build a little layer that compiled down to this C-like IR. And on the flip side, if they're building new hardware, they just need to make sure that there's a plug-in that knows how to compile from that IR to their chips, machine language, what have you. The breaking point was, "Oh, new hardware, heterogeneous computing, all these different things." That IR does not sufficiently express what we want to express. It doesn't allow people to do it.

So, you're kind of in some ways, if I understand it, MLIR is almost a higher level IR that allows you to express many more concepts than could be expressed within C. And then also do build sort of reusable abstractions at that level, optimizations, and then there's another layer for people to target.

Now, two questions that I have coming into this. Actually, I'm going to start with the sort of downwards path. So, from MLIR, you have this intermediate representation. You're able to do whatever you're doing. What is below that? Does that like get lowered? Or is that able to then go straight to machine language? What does that path look like?

[0:34:37] CL: So, it depends on the compiler you're building, but typically you end up with LLVM.

[0:34:42] KB: Okay. So, this is kind of another layer on top, essentially, which makes sense. I mean, compilers are probably the deepest abstractions known to man.

[0:34:50] CL: That's right. So, if you're building an AI compiler, which is where I'll focus, but there's also lots of other cool stuff you can do. But if you just look at AI compilers, typically you're starting with tensors. And tensors of gigantic multi-dimensional arrays, and then you need to lower it down into, I have a four by four matrix multiplication, things like this, or I have SIMD operations on a CPU, things like this. That mapping process is very complicated, but once you get down there, then LLVM is amazing. It's really good at doing registrar allocation and scheduling, and the core things in code generation. It's quite good at that, and so it's about, how do we solve these higher-level things?

[0:35:24] KB: Okay. That's awesome, and I love that. The next question is kind of at that MLIR layer, you highlighted that you're trying to make it useful, not just for GPU programming, but for kind of programming any number of these sort of accelerator-type heterogeneous computing problems. So, what is the abstraction that this MLIR operates at? Like what is it that makes that work?

[0:35:45] KB: So, MLIR does not, so this is where Mojo comes in, but let me let me express why this is. MLIR solves a compiler construction problem, and so it's really good at making it so compiler experts can build a new compiler quickly. But it does not solve design. It does not solve any specific abstraction problem that you might have. Let me just give you an example of this. MLIR has many, many, many different ways to map tensors onto hardware. If you go check out mlir.llvm.org, nice plug, you can go read about these things. None of them actually work very well.

So, you need, just because something can represent different levels of abstraction, and by the way, the ML and MLIR doesn't stand for machine learning. The ML stands for multi-layer. It's about progressive lowering. It's about representing things at the right abstraction level to do the optimizations you want to do. I sometimes joke that as a compiler engineer, that which you can represent, you can transform. So, a lot of the power on compilers is getting things at the right level of abstraction, so you can do things with them.

But you still have a burden of having to decide what those abstractions are. And MLIR really doesn't help you with that. What's been happening is the industry at large has been stabbing around the dark, and there's a million different projects, and none of them have really gotten traction. This is why Mojo has to come into this world. Because first of all, there aren't languages, really. Actually, I'm going to blow your mind in a second and tell you the other language that targets MLIR. But generally, people have been integrating these things and trying to get them to talk to PyTorch or TensorFlow or things like this. And so that's been a huge challenge on some. Does that make sense?

[0:37:26] KB: I think so. So, I guess then a couple of different things that I'd wonder is, so multi-layer is cool. Are those layers well defined right now or is that also flexible?

[0:37:36] CL: They're completely flexible. So, MLIR has a concept of a dialect, and so you can define a dialect for your hardware, for LLVM itself, for your programming language, for quantum computing. It's domain-specific, and you get to own the domain. This is what I'm saying, where MLIR is very powerful and very awesome. But the power gets reattributed back up to the person holding the hammer. Because now you can go -

[0:38:01] KB: It's a very powerful hammer, in some ways, right? It's saying, for example, I have a new piece of heterogeneous hardware, a new accelerator, I can build a domain within this that understands how to translate from some of the higher level MLIR into the way that this is going to work optimally on my device, and now it can get lowered into LLVM IR and go.

[0:38:22] CL: Exactly. Now, here's the thing, right? So, Mojo exists because the entire industry was not doing it right.

[0:38:29] KB: You need a proof point. You need an example for people to see and follow. 

[0:38:31] CL: For years, people have been successfully taking MLIR and they've been integrating into existing systems and upgrading things and it's had huge impact. I'm very, very, very proud of MLIR and it's got a massive ecosystem and lots of different people use it. It's very, very exciting. But there is no language or there is no really defined programming language that uses it. There's a lot of domain-specific languages or EDSLs or things like this, but there was nobody that was actually taking this and using it in a way that expresses the full power of generic heterogeneous hardware back to programmers. It was more about, like, try to get PyTorch to go fast. And some of those sort of worked, but most of them did not, right?

Now, I have to be careful because Mojo is very uniquely designed for MLIR, but there's one very brand-new, very exciting new programming language that now directly goes to MLIR as well. This is a new LLVM project that is just graduating to the next level of maturity, the LLVM Fortran compiler.

[0:39:26] KB: Interesting, okay.

[0:39:29] CL: So now, you brought up earlier, let me connect the dots here, right? Well, so in this case, it's kind of an accident that the folks working on Fortran, wanted to build a new compiler for Fortran because all the old ones were the wrong thing for a variety of reasons that I'm not the expert on. But now at your point, Fortran has parallel arrays, and it has higher level abstractions that have to be lowered. They said, hey, well, this MLR thing's great because it allows me to represent my Fortran domain-specific constructs. By the way, modern Fortran has objects, and it has all kinds of crazy systems.

[0:39:59] KB: It's come a long way since I was working on it early 2000s, yes.

[0:40:02] CL: It's easy to make fun of Fortran because it's from the 1960s or fifties or something like that. But actually, Fortran has evolved into a very modern language with a massive community of
its own. Maybe not as may web apps built in Fortran. But if you look at this, it's the power of MLIR that allows people to tackle these problems and solve them in a great way. Now, what Mojo does, so coming back to what - now that you get some idea of the wackiness of MLIR and both the power but then also the curse that comes with having to make design decisions, right? What Mojo says it says, "Hey, well, the problem with a lot of these systems and compilers and these PyTorch thingies and like all the different chip people building stuff is that they are functionally taking power away from the programmer and locking it up inside the compiler."

So, what Mojo does is it says, "Hey, you know what's important? Library design." Let's take tensor lowering. Let's take all this magic out of the magic compiler that a few people are building, and let's put it back in user space. Let's put it into the mojo code. Let's enable people to tackle tensor cores or their hardware or whatever else it is, and let's give that power back to the programmer and let's teach a new generation of people how to actually deal with all this cool stuff. The way that we do that, in my opinion, is through software engineering.

It's one of these old things that is not magic, but it's actually a lot of hard work and design. It's about building libraries, building abstractions, building ecosystems that are layered the correct way, and putting that in a way that's successful so you don't have to be the hardware expert, you don't have to be the compiler engineer. I love compiler engineers, by the way, but it's like re-democratizing all this technology instead of trying to lock it up into libraries written in an assembly or into compilers themselves. So, that's what makes Mojo very exciting. That's why having metaprogramming is so important to me, it's because now you can allow people to write just code, that has the power of traditional compilers, and that is I think very profound. We're using it to go solve problems for AI and GPUs and things like this, but I think this is a very powerful thing, that as time plays out, because these things take time, I think the world will discover and be able to use in new ways.

[0:42:13] KB: This is fascinating. This is a domain that I like to bring up every now and then and geek out on, which is just like, I feel like DSLs in particular are massively underutilized. They're incredibly powerful. There are some interesting examples that I - so for a long time, I was very much in the web space. I saw Babel arise and Babel is essentially a user space compiler that allows you to write language features, DSLs, whatever, or do other different things that compile all to JavaScript. This was utilized for things like building out JSX for React and things like that. I looked at that, I'm like, "Oh, this is going to create a thousand flowers." It basically stopped with JSX. There's not very many people writing lots of domain-specific language extensions there. It's still incredibly useful tooling. It lets you do a bunch of different things, but I'm kind of curious how, like, it feels like language design, which is what we're talking about here, empowering people to build languages on top of languages. Another world that did a lot of this was like the Ruby world was very into like DSLs and built DSLs and things like this. That's a hard problem. There aren't very many people who are good at language design either. So, how do you make those flowers bloom?

[0:43:28] CL: So, when you talk about DSLs, so let me also give another plug. I'm writing this series of blog posts called, Democratizing AI Compute. So, I go through what went wrong with a whole bunch of different systems, including OpenCL and AI compilers and CUDA and many other things in the system. The one that's publishing this week is about Triton Lang and domain specific languages that are embedded into Python. What is a domain-specific language? Well, they're actually all around us, right? There are things like SQL or regular expressions or even HTML as a domain-specific language, right? These are quite powerful. There's a subset called an embedded domain specific language, which is where you don't invent syntax, but you hack the compiler, you hack the Python interpreter, you hack it so that it looks like Python, but it's not.

The blog post lays out, these are both very powerful and very exciting, very nice, but they're also very cursed because it looks like Python, but it's not a language. Now, you're beholden to some weird compiler, some weird system that is now, again, it's using Python to hack it, so they don't have to do all the work of building a language, but you don't get the same quality result. Mojo is completely different. Mojo, what we're doing is we're saying Mojo is a language. It has an LSP. It has a debugger. It has a code format. It has a compiler, obviously. It is a language. It is a full-fledged, the hard thing to do. So, it's a different energy state than a DSL.

But what it enables you to do is it enables you to build very powerful libraries. The canonical example of this that is a language and enables you to build powerful libraries is Python, right? And there's a lot of very powerful Python libraries. You can look at PyTorch or TensorFlow or things like this in the AI space, but gazillions of other ones, right? We're in that space, which is enable people to build powerful libraries and then allow even more people to build on top of them. To me, that's really what the sweet spot is, is I shouldn't have to build an EDSL, I shouldn't have to go hack a compiler, I should just be able to write a struct and use types. And then other people who don't want to know how it works can build on top of it. And this is the miracle of software engineering, which is basically it's collaboration. It's about ecosystems. It's about people working together. It's about the power of, again, having two people is amazing, but having two people build something that's used by 20 people, that's used by 200 people, that's used by 2 million people, right? I mean, this is where software leaps get made.

What my goal is with Mojo and also with Max together is to make it so you have the ability to unite the ecosystem, unite all these people that have different levels of expertise and get them to work together. Because I'll tell you, I don't know the latest Blackwell, TensorFlow, or Goofy layout nonsense because it's not compatible with Hopper and blah, blah, blah, blah, blah, like all that kind of stuff. Other people do. So, once they build the library, I can build on top of it. That's power, right? And I don't want to have to like switch between different systems every second Thursday because I have a slightly different use case.

[0:46:27] KB: - SO let's follow that thread a little bit and talk about what does the Mojo community look like today? Who's developing it? What's governance like? What does the library community look like? What's going on there?

[0:46:39] CL: Yes, so Modular is driving Mojo, and so we're funding it, we're paying for it, we're building all this stuff out. We are open sourcing it over time, and so we have an open-source standard library, and standard library is about half of Mojo, by the way. I didn't dive into it, but like int and float are not part of the compiler, they're part of the library. Again, this is build powerful, composable orthogonal language features, so you can put all of it into library, so yes, int is -

[0:47:05] KB: So, what is the language feature that int and float are built on top of?

[0:47:08] CL: They're built on top of - they're all structs. Like, int is a struct, struct int. And then as a dunder add method, just like you do in Python. And then you have the ability to use inline MLIR. So, you basically say, "Okay, well it's turtles all the way down, zero cost abstractions, turtles all the way down, until you bottom out at the low-level compiler guts that you're talking to."

[0:47:28] KB: Fascinating. So, you built int in the library using inline MLIR to define how the operations work.

[0:47:34] CL: That's exactly right. It's all open source. You can go check it out. It's super cool. So, our community has, we have discourse for an online forum. There's a bunch of discussion there. We have Discord, which is for the chat stuff, and so we have a bunch of folks there. I have no idea. It's like 20,000 people in the Discord, something like that. They're all kind of doing different things. We've made the commitment. We're open sourcing the entire Mojo implementation, but we're doing that in steps. And so, Mojo's still being developed. I have some scars and battle scars from Swift where we spent way too much time arguing with people online about things.

The short version of evolution of Swift is it was secret for four years, and then it got announced and launched, and then it was proprietary for a year, but public, and then it was open, open governance, open everything since 2015-ish, something like that, and I think that went too fast. Swift, it was great to have the community. I love open source, by the way. I've written a few million lines of code that's open source that others that are not, but it's about the right time and the right place and having a small, coherent team driving the core architecture I think is extremely important for the early phase of a project. So, we're opening things up over time, and we've made a public commitment, well, I think we'll exceed expectations there.

[0:48:49] KB: Got it. Well, and as you stated, you're keeping the core language, which is what you are driving with that dedicated team, small, a huge amount is available for anyone who wants to by writing libraries.

[0:49:01] CL: Yes, that's right. And Modular keeps open sourcing more stuff over time. The other thing I'll say is that Max, if you zoom out from Mojo, so Mojo allows you to program CPUs and GPUs and makes stuff go fast, and this is very powerful and integrate with C and Rust and all this ecosystem, right? Max is that next level up. It's the AI solution. And so, it enables you to write models. So, we have our own kind of, it's in the space of PyTorch where you can now define your own models and actually build on top of this. Now, you don't have to know how a matrix multiplication works or something like that. You just say, "Give me one." And you get very fancy compiler techniques that do automatic code fusion and things like this, and so you get very high performance from the Max. It's called a graph compiler, right?

So, this then takes this and the way that works and the way that's really novel is that it builds on top of the power of Mojo. So, all this metaprogramming stuff, the ability to have the MLIR, like all of these very nerdy compilery things enable us to build the next level up system that's way more powerful, and it's also an open box. So, just now, we're now documenting how to do this stuff, and so we're teaching people how to program GPUs. Well, this is a huge thing because the code you write in Mojo with Max and together, yes, you can use it to solve AI, which is awesome and very, very exciting for lots of different reasons. But we're really about enabling heterogeneous compute, enabling people use the accelerator, enabling people to build, and scale across hardware and all the stuff is free. So, this is a really big deal and while some of these technologies are young, like they're very, very powerful. So, I love to see, to your point about community, like what people are doing with this and can do with this. I talked with people. Last week, I was talking with a whole bunch of people that are that are talking about robotics and all the applications and there's apparently this Python-Rust thingy and moving chunks of that to Mojo would make it so much faster and better and solve all these problems. I'm like, "Wow, I know nothing about that. That sounds really cool." That's what I love to see.

[0:50:56] KB: Yes. I love that. Well, I just realized, I mean, I feel like we could keep going for hours, but we are getting close to the end of our time. Is there anything that we haven't talked about yet that you think would be really important to leave folks with?

[0:51:09] CL: Yes. I mean, the thing I'd love to say is that I get very frustrated with the industry right now, because if you look at the powers that be out there, you've got these massive companies doing AI, right? You've got these very well-funded AI research labs that are making the news all of the time, right? You have all of this just investment in AI, and it's all, you have the hardware companies like NVIDIA that are saying like, "Okay, well, CUDA, CUDA, CUDA," right?

What I see happening out in the world is I see all these people that are telling you, telling us all, the AI is too hard, give up, "only we can do it," right? Whether it be the big LLM company of the day that's saying, "Just use our endpoint," or it's the hardware company saying, "You have to buy all of our hardware and only from us," or the big tech company that's saying, "Hey, all this stuff is too complicated. Just use our cloud service or something." But it's all a lie. The stuff is complicated because the software is complicated and because this was all cobbled together and it doesn't make any sense.

But the technology is not that complicated. What I strongly believe in is as we democratize this stuff, as we make it actually easy, if we make it so that people have power back over the software, suddenly we can have another wave of innovation and we can take back power from all these like overfunded, overly powerful ecosystem players, and we as a community can rise up and we can achieve a lot of the stuff that they tell us is impossible. Because, I mean, I am empowered when I talk to new college grads, right? So, I'm an old dog at this point. New college grads know all of this stuff. They know how models work. They know how sometimes GPUs work. They know all this stuff. But the problem is, is that we have this fragmented talent ecosystem. And what I really believe in is if you get the people able to work together and collaborate and build stuff in the open together, like the whole world will continue to change in a much more powerful and positive way than, "Okay, please just use our endpoint because you're not smart enough to know how AI works," which I don't believe. So, this is a big part of the mission.

Also, as a nerd, all the hardware out there is so exciting. It's all lacking software. So, we're trying to unlock that. This will be a big part of our storyline, particularly as we get into late this year. But I think that's a huge opportunity. The way I look at hardware today is that array is crying about NVIDIA GPUs and the prices and all this kind of stuff. But when I look ahead, 5 years, 10 years, when I look into the future, I know deep down in my soul that hardware is going to be even more weird than it is today. Physics, we're not in the age of Moore's law, innovation is not dead, new algorithms, new research, all the stuff's going to continue to happen. And so, what we need is the ability to scale into this and this is what I think we're trying to do, and very excited about that.

[0:53:56] KB: I 100% agree. We are in an age of increasing weirdness and heterogeneity in the hardware world. It's also a golden age of hardware innovation. There's so much going on because we're suddenly in a place where you just can't get enough compute. So, everybody's trying to innovate and change the boundaries, and we need programming languages that expose that to us as software developers.

[0:54:18] CL: Yes, and programming languages fundamentally unite communities, right? That's where, again, what I'd love to see is different people with different backgrounds, different perspectives, different use cases that can collaborate and solve problems together. This is software, right? This is what's powerful. So, if you're interested, please do check out our webpage. We do have a ton of stuff. We're open-sourcing things all the time. We have major new releases coming out regularly. There's all kinds of new capabilities. Please join our community. Read the blog. If you want to learn about the history of AI compute and why the software is so screwed up, trust me, I have decided that instead of telling every individual person, I should write it down and scale this a little bit. If you find that, why didn't OpenCL win or something? Please, I'm happy to teach you about it.

[0:54:59] KB: Awesome. 

[END]