EPISODE 1768

[INTRODUCTION]

[0:00:00] ANNOUNCER: Vulkan is a low-level graphics API designed to provide developers with more direct control over the GPU, reducing overhead and enabling high performance in applications, like games, simulations, and visualizations. It addresses the inefficiencies of older APIs, like OpenGL and Direct3D, and helps solve issues with cross-platform compatibility. Tom Olson is a distinguished engineer at Arm, and Ralph Potter is the Lead Khronos Standards Engineer at Samsung. Tom and Ralph are also the outgoing and incoming chairs of the Vulkan Working Group. They join the podcast to talk about earlier graphics APIs, what motivated the creation of Vulkan, modern GPUs, and more.

Joe Nash is a developer, educator, and award-winning community builder who has worked at companies including GitHub, Twilio, Unity, and PayPal. Joe got a start in software development by creating mods and running servers for Gary's Mod, and game development remains his favorite way to experience and explore new technologies and concepts.

[INTERVIEW]

[0:01:16] JN: Welcome to the show. Thank you so much for joining me today. How are you doing?

[0:01:20] TO: Hi, Joe. Doing good. This is Tom Olson, by the way.

[0:01:23] JN: Awesome. Perfect. How about you, Ralph?

[0:01:26] RP: Hi, Joe. Yeah. I'm doing good. Thank you for having us both.

[0:01:29] JN: Awesome. To kick us off, I introduced you both there as the chairs of the Khronos Working Group. There's a bit more to both of your stories. Tom, do you want to kick us off by introducing yourself and how you came to be working on Vulkan?

[0:01:41] TO: Sure. I work for Arm, which is known as a CPU company, but we do also make GPUs, the Mali GPU family. I've been a professional graphics standards committee chair for the past 18 years. It's a bit horrible. I started sharing the OpenGL ES standard, which some folks may have heard of as the mobile version of OpenGL. When the need came out for Vulkan in 2014 or so, I, for my sins, picked up the flag and ran with it and helped to get that effort started. I've been doing that ever since. I've reached the stage where it's time for me to get out of the way and let younger people take charge. Hence, Ralph.

[0:02:23] JN: Perfect. Yeah. Speaking of those younger people, Ralph, how about you?

[0:02:26] RP: Yeah. I work for Samsung, where I'm located in our GPU team, specifically Samsung mobile, the portion who make mobile phone handsets. I've been in the Vulkan Working Group for six, or seven years now, five of them representing Samsung. Yeah, I don't have Tom's 18 years of experience of chairing, but I have a little bit and it will be hard to replace Tom, but we will do our best.

[0:02:54] JN: Of course.

[0:02:54] TO: Can I point something out?

[0:02:55] JN: Yeah, please do.

[0:02:57] TO: Both Ralph and I have been working on standards before our current employers. It's the culture of the group. There's a large number of people involved with creating Vulkan, where their involvement persists across multiple employers. It gets in your blood. It's hard to put down and it's a rare skill that companies need, so they will often hire you to keep doing what you're doing.

[0:03:23] JN: Fascinating. Yeah, that leads me into, I guess, a topic that I think would be really interesting to explore, which is how Vulkan is developed and what Khronos is. I guess, first, to set the scene for folks who somehow aren't familiar with Vulkan, Tom, can you tell us briefly what Vulkan is?

[0:03:36] TO: Sure. Vulkan is the modern way to program GPUs. In the past, you've heard of APIs like DX9 and OpenGL. Those were graphics APIs. There was a big magic driver that turned that into GPU commands. With Vulkan, what we've done is remove - it's not really a graphics API. It's an API for controlling and programming a GPU. All GPUs today, you probably know, are highly programmable. They have multiple execution engines and you can write code for them, but getting it to run and run efficiently in parallel on the GPU is difficult. Vulkan's job is to expose that power.

[0:04:22] JN: That's a really interesting distinction between, it's not a graphics API. It's an API for controlling a GPU. Very, very interesting. I guess, what are the - you mentioned, obviously, it's easy to control that power. What are some of the advantages of this approach? What problems are you directly looking to solve versus the previous generation?

[0:04:39] TO: In the previous generation, the basic problem is that a GPU's programming model is incredibly different from a CPU programming model, even a CPU cluster programming model. Massively, massively parallel, data parallel, typically, and increasingly flexible, too. The basic problem was that the old generation of APIs, OpenGL, etc., presented a very nice, convenient, but very CPU-like programming model, where you gave a command, the device did it, you gave another command, the device did it. GPUs don't work like that at all. You queue up massive numbers of commands, and you shove them in, the driver and the hardware take them all apart, execute everything. It's like a data flow paradigm. Everything runs as soon as it can, as long as the hardware understands the dependencies.

The result is that with the old APIs, you couldn't get the efficiency you wanted, because you were working through this very thick abstraction, and this very sequential abstraction. You need an API that exposes the massive parallelism of the device. Vulkan does that. Specific problems we had in the old days, OpenGL had no real way to make use of multiple CPU cores, so that if you were trying to keep the GPU fed with commands, because the GPU is just - GPU is a voracious consumer of data and commands, and you might need multiple threads in order to generate those commands fast enough. You couldn't do it. It wasn't in the programming model. Vulkan solves those problems.

[0:06:21] JN: One of the goals of Vulkan, as I understand it, was supposed to be hugely cross-platform and support lots of platforms, which I guess is shown by both your backgrounds, between Arm and Samsung, obviously focusing on a whole world of devices. Ralph, can you talk about what platforms you're looking to support with Vulkan? I understand it's not - I think most people think of PCs and the consoles, right? I understand it's a whole world of things.

[0:06:42] RP: Yeah, sure. Definitely it exists on PCs. All of the desktop GPU vendors have Vulkan drivers. It exists on some of the consoles, both handheld and dedicated ones. It is pretty fundamental to Android these days and becoming more important. The vast majority of mobile phones that you can buy today will support Vulkan. There are some outliers, but the vast majority will, at least in the Android space. 

You will also see it - it may not be so obvious, but it also exists on other devices as well, more embedded devices, appliance type devices. You might find there is Vulkan in there, even though it's not obvious to you as a user. The short answer is, if it's got a relatively modern GPU, there's a good chance that there is a Vulkan driver available somewhere.

[0:07:41] JN: Right. I think I saw it in a talk from SIGGRAPH that someone mentioned a Coke machine that might be new to actually, that a Coke machine is an example.

[0:07:47] TO: That was. Yes.

[0:07:48] JN: Yeah, yeah, yeah. Yeah, particularly cast programming environment. Cool. See, I think that really sets the foundation for what Vulkan is. I guess too, I wanted to talk a little bit about the history, which obviously, you mentioned a bit in your intro, Tom, about how it - around the start of it and transitioning from OpenGL and OpenGL ES to Vulkan. You mentioned that there was a need for Vulkan, and that's when you started working on it. What was that need? Can you give us the run up in the history to Vulkan?

[0:08:11] TO: Sure. If you go, "Oh, yeah. It's shocking how long ago this was." Go back to say, 2012 to 2014, the dominant graphics APIs of the day were OpenGL and DX 11, which was the most modern version of DirectX on Windows. They both had this problem that they were lovely programming environments. It was very comfortable and easy to move into using them. They worked the way a CPU programmer would expect a graphics API to work. But people had enormous difficulty getting performance out of the devices. As a result, for example, on consoles, nobody used them at all. Well, they used DX a bit on Xbox, but mostly, they just threw them away because they could not get the performance.

Around that time, you could say it was a revolution. The farmers with the pitchforks developers were looking for alternatives, and you had things emerge, like Mantle, which was an AMD proprietary API, since AMD had cornered the console market at the time. It had this property that it exposed the parallelism at the price of not being as nice a programming environment. It's much more painful to use and complex to think about, but it gave you the power. People were very excited about that. There was talk of moving toward it.

Microsoft began moving in the same direction with an API called DX 12, which I should say, DX 12 is to DX 11, pretty much what Vulkan is to OpenGL. We divide the world down into modern GPU APIs, which is Vulkan DX 12 metal, and the old ones, which is OpenGL ES, EX9 to 11. These modern APIs were emerging. We felt that OpenGL was going to be left behind. We could see that it had problems. Well, I would say, we were all working off to, I believe, we were all working in the OpenGL space, so I was chairing OpenGL ES.

We could see that there was no way we could evolve those APIs in a gradual way to meet the need to provide the efficiency that developers were just demanding. That led to kicking off the effort. We kicked it off in 2014, took us a year in a bit. We came out in early 2016 with Vulkan 1.0. Did that answer your question?

[0:10:45] JN: It did. You also answered another question. Obviously, Direct X12 and metal - my timeline is a bit fuzzy, but my understanding is they're all at the same time, all the same generation. And so, I was going to ask how did that happen? Why it was the same time? But I think you've really filled that in. I guess, one thing you mentioned there is this step change in developer experience in terms of what the developers could expect the API to do for them and how much extra work they had to put in. How do you go about navigating that in terms of an API design philosophy? You're trying to meet the needs of the users as graphics programmers, they come in, there was a demand for this level of API, but I imagine you still have a lot of people who are still expecting the affordances of the old API. How do you juggle that?

[0:11:25] TO: Painfully. It's a constant, I won't say balancing act, but it's a constant debate. To be frank in Vulkan as it is today, well, certainly in Vulkan 1.0, we created an API which gave developers what they said they wanted, but was frankly quite difficult to use. We made it an intellectual commitment. By the way, when we started this effort, we had massive participation, particularly from game mentioned companies, Epic, Valve. Well, Valve first and foremost, they were real champions of Vulkan early. But Epic, Unity, all the majors were there. We had a big fight about, are we going to make concessions to ease of use? Are we going to say, performance is first, full stop? We pretty much said, no, performance is first. We will never sacrifice that.

What's happened is that the hardware's gotten easier to use. Vulkan has gotten easier to use in parallel. Modern Vulkan is not nearly as gnarly and sharp-edged as Vulkan was early. Well, I think I'm rambling a bit here, but I'm trying to make sure I hit the various aspects to this. I would say, one way we deal with this problem is by tooling. A feature of Vulkan that we think is one of the best ideas ever, I wish I could remember who in the group had it. A feature of Vulkan, since it's dedicated to efficiency before all else, it does not check for errors. When you give a command to a Vulkan function, if the commands you give it are meaningless, the specification says, you get undefined behavior, possibly including program termination. You make one mistake and it's dead. Driver restarts. What do you do?

Well, what we do is Vulkan has a defined interface to shim layers. We call it the layer system. When you create a Vulkan - you're a programmer, your program says, "I want to use Vulkan, give me a driver, please." You go through this negotiation. You can say, please install the validation layer on top of Vulkan. If you do that, you get the same interface, all the same functions. But when you call, if you pass garbage information into a Vulkan function, the validation layer checks it before it calls the underlying driver. It logs an error if you did something wrong.

The validation layers are incredibly powerful and useful. The investment that's gone into them is millions and millions of dollars. It's very complex software. A lot of it paid for by Valve. A lot of other stuff written by members. But it's one of the most important things. We're trying to provide you with a safe and sane programming environment, but not at the price of slowing down the hardware. The idea is you develop with validation turned on. When you ship your code, you turn it off. Suddenly, everything runs much faster, because the driver is not checking any errors itself. Ralph, what have I forgotten? I rabbit-holed on.

[0:14:41] RP: No, I think all of that is correct. If I was going to give one quick piece of developer advice, I would say, if you're writing a Vulkan application and you are doing it without the validation layers enabled, you are doing it wrong. You will come to regret it. They are pretty fundamental to that. I also agree with Tom that this balance of usability and the challenge of using the API is a difficult problem.

If you go back to our 2016 launch publicity, at the time we said, Vulkan is not the API for everybody. I think it has become less thorny to use as time has gone on. There is still no free lunch. It is still definitely harder to get started in Vulkan than it is to get started in OpenGL. There is a higher expectation. Tom said that this is an API to control a GPU. There is a higher expectation that you will understand how a GPU functions than maybe there was in OpenGL. I think, once you have that understanding and once you've got over the initial hurdles of how to get started, the fact that it is more predictable, there are less unexpected driver heroics going on. There is a place in which you can say it is more workable, but it requires a certain base level of understanding. Certainly, the barrier to entry is higher. That's undeniable and it's intrinsic to what we built.

[0:16:19] JN: Really interesting point about if you understand how a GPU works, it's easier to use. I feel like, in recent years there's been a lot more general awareness of how GPUs work from base level programmers because of general purpose GPU and we'll see ML and AI and is accelerating people's usage of GPUs. Do you think that understanding is becoming more widespread in the developer community and making it easier for developers at large to use Vulkan, or? Yeah, I don't know.

[0:16:45] RP: I think, well, it's hard to answer that question without assigning where we are today, versus where we are with the old API. There was a lot of information out there nowadays. There's been a lot of presentations, a lot of talks, a lot of documentation from GPU vendors that will tell you nowadays how things work in the OpenGL three days. I'm not sure how much of the same information was out there for the general public to consume in the first place.

[0:17:15] JN: That makes sense and goes along with Tom's point about, because the graphics cards are easier to use. Vulkan is therefore, easier to use. Absolutely. You mentioned, throughout the explanation you mentioned Valve and you mentioned members and sponsors and we had the introductions, both of you, about working for members of the group. I think it'd be great to talk about now about what the Khronos Group is, how it's organized and how Vulkan is developed. Tom, would you like to kick off with, I guess, start with what is Khronos in the working group?

[0:17:39] TO: Sure. Khronos is, well, it's an international consortium and standards body with the mission statement of connecting software to hardware. They generally, most of the products they create, though not all, are interfaces between some gnarly piece of hardware, like a video accelerator, a computer vision accelerator, a graphics accelerator, and applications. There are standards range in approach from being hard over in the direction of developer-friendly and relatively easy to use to things like Vulkan, which are, here be dragons, but you'll get power if you use them.

It's got about 120 members-ish today, I think. It includes Samsung, Arm, Intel, AMD, Silicon companies. It includes game engine companies, Valve, Unity, Epic, several others. It includes software consultancies, LunarG, who create the SDK and the validation layers for us. There are other kinds of people involved. It's a wide variety. The thing that ties us together is that there is an IP agreement, and this is very important. You don't want to read the legalese, but in a nutshell, we agree that if you have patents that are necessary to implement Vulkan, or any other of our ratified standards, you agree to license them to implementers of Vulkan at no cost, if they're necessary. Patents on techniques for implementing Vulkan, like particular circuits for this or that, you can own and you can enforce the patents. But something that the standard itself necessarily infringes, you have to license, or you can withdraw from. There's a way to withdraw yourself, but it's suicidal. It's very rare to do that.

That's Khronos at a high level. There's a board, ABI. There's all this infrastructure. Then there's the Vulkan Working Group, which has the members that I said. On a typical call, we have maybe 40 people, from maybe 20 companies participating. We do design work on new functionality, and we do this maintenance call, which we did this morning, where we go through and fix the corner cases and answer developer complaints. We have a presence on GitHub and anybody in the world can come in and say, "This doesn't seem to work the way I thought it did. Is the spec wrong or what?" And we will jump on that and answer that. Often, it results in spec clarifications.

We do other things. We make a conformance test, so that if you're implementing Vulkan and the spec says, though implementations must do this, we make sure they do. That's our biggest expense. We spend about a half million a year, more than that actually, writing those tests. We have software contractors who do that for us. Then there's the SDK and the tooling. There's a compiler that is used for, as I said, GPUs are programmable. You program them in special languages, and there's a compiler for that. Those are all things that we maintain as part of this effort.

[0:20:57] JN: Okay, that opens up so many questions. I'm going to start with the conformance tests. You mentioned, there's an implementation, you will test it. That's responsibility you take. When you say an implementation, what does that mean? I guess, my question that came from that is, if someone goes and starts a new Vulkan implementation, just like a random person, and then they says, cool, do you have to test it? Is that how that works?

[0:21:17] TO: Here's how it works. Vulkan is a trademark of the Khronos group. If you want to use that trademark, you have to have permission, and the Khronos groups on the website, the guidelines say, you can use it for conformant implementations. There's some weasel words about if it's in development, and it's not certified yet. Basically, when you've got your thing done, and sorry, let me back up and say, typically, a Vulkan implementation is a device driver. It comes from a GPU vendor, and you get it and install it on your machine. Because you have the right GPU, you install their driver.

We have the ability, this is built into the infrastructure, that if you have two separate graphics cards in your machine and two GPUs from two different vendors, you can put in drivers for both of them and it'll work. Your application will have to choose when it's setting up, starting to use Vulkan, say, "Well, which device do I want to run on?" Generally, a Vulkan implementation is a device driver.

There's a lovely software implementation out there called Lavapipe. If you are experimenting and learning and having trouble, or you just don't want to install a device driver, you can use Lavapipe. It's quite efficient, actually. Sorry, I got sidetracked.

[0:22:34] JN: No, absolutely. That's perfect. Yeah, Lavapipe sounds really cool. If in that, ultimately answered my question. Sorry, I'm now going backwards through your answer to how do Khronos worked. When it comes to participating in the working group, you work at Arm, Ralph works at Samsung, what is the, I guess, arrangement there? Do the members give employees over to the working group full time? How does that work?

[0:22:54] TO: No. Well, it's up to the member. Members of Khronos are typically companies. There are a small number of individual contributors approved by the board. Generally, to work in Khronos, your company joins, they pay a fee. It's a couple of tens of thousands a year. Then they have the right to participate. What that typically means is they tell certain of their employees, part of your job is to go to the meetings and contribute to making Vulkan better and making Vulkan work for the community.

In my case, as chair, it's been a full-time job for me, but that's rare. For most of our members, they're working maybe 10% to 40%, 50% of their time devoted to working on Vulkan. The rest, they're doing things for their own companies. Participation involves, in the case of Vulkan, two 90-minute calls a week, new tech and old tech. Plus, we have subgroups. There's a separate group that deals with ray tracing, and they have their own meeting. There's a separate group that deals with machine learning, and they have their own meeting. There is a separate group for dealing with the programming language that is used to program the programmable parts of the GPU. We have a few others, there's a marketing committee, etc.

You can get as involved as you want. You can't really be effective if you aren't spending at least 10% of your time on it, because you need to be known, you need to have traction, you need to understand what's going on, and there's a minimum cost to that.

[0:24:29] JN: Yeah, that makes total sense. I guess, into the particular scene, you've got those meetings, but Vulkan has any software project, cadence of releases and things that get added, and the meetings you'll have to come out. Ralph, as the person who's now responsible for this, can you talk to us about the roadmap and how they're constructed? There's a couple of terms that I think will be useful, because I know there's been some change in how the roadmap and how versions have worked over the years from core version, the roadmap profiles to milestones. Can you lay all that out for us and how it works?

[0:24:55] RP: Yeah. The first thing to understand about Vulkan is that we have a core API that it's the default set of things that everybody implements the mandatory requirements. On top of which, we have a notion of a thing that we call an extension, which is another package of functionality that GPU vendors, or implementers who feel that it's valuable can implement this extension specification, and it's essentially an optional piece of functionality that they might decide that their market and their customers see value in. The way that we used to do things is that we would release a core version on a pretty regular two-year cadence, which would roll up a certain amount of functionality that had been exposed in extensions, but extensions would flow out throughout the year on a pretty ad hoc basis.

A few years back, we came to the realization that this was getting extremely difficult to handle as a software developer. We have 11 adopters, I want to say, off the top of my head, somewhere around that field. We had all made different decisions about what we felt with valuable extensions. The extensions themselves contain optional sub functionality in them, and identifying exactly what you could expect as a developer became very difficult. On top of which, we had always taken the view that the core API had to be capable of running on more or less everything. You referred earlier to Tom saying, Vulkan on Coke machines, that was a constraint on the core API was from 2016, when we launched Vulkan 1.0. We never raised the minimum specs that you needed to run the core on. Our only approach was to add more and more extensions.

As a route to trying to bring some order to that, our process now is we define a thing that we refer to as a roadmap, which is again, a collection of extensions and features, but we say for a particular subset of devices, we describe it as immersive graphics devices. You can think mid to high-end smartphones, desktop PCs, consoles, that everyone will ship devices that fit the requirements of a particular roadmap by a particular point in time, or approximately that point in time. We hope that brings some cohesion. Those have dates on them. We released a roadmap 2024 earlier in the year. I don't think it will be a huge surprise to anyone to say that there will be another one coming. We have now got into a model where we can plan the rough content of roadmaps many years out in advance, which is also important, because hardware roadmaps are amazingly long. If we need to have a conversation in the working group about, we would like us all to support feature X, if somebody doesn't have it in hardware, we're talking about probably five years to go from a hardware design to an implementation to something that shows up in a product.

We have a roadmap that for a couple of years out, and we have more tentative things for, as far out as 2030. When we get that far out, they're nebulous. Maybe they won't arrive in practice, but there's a structure to it now. I think, my message would be, if you're a developer trying to figure out where we're going, the roadmap tells you directionally where we're going. The core API is supposed to tell you what you can rely on to exist on any device that has updated drivers, is how I would categorize it.

[0:29:04] JN: That hardware view and that time that adds to it is, yeah, that's a really fun constraint for working in this world. Obviously, you've just told me there that you do plan out far enough in advance, which leaves me no choice but to ask what's next to the next milestone, which, I guess, is 2026 is every two years, right?

[0:29:19] RP: There is a milestone plan for 26. I believe that we shared some of this at SIGGRAPH.

[0:29:25] TO: Yeah, I had slides on it, but I can't remember.

[0:29:30] RP: I am now trying to recall your slides, but I mean, things that I know are on there, we have some work on debugging improvements. I believe there's some work on compute improvements. I believe, we talked about ML work to come.

[0:29:46] TO: We've got a couple of robustness features. Well, we have this expectation that Web GPU, which is the Web graphics API, which is vaguely Vulkan-like, but much friendlier, and does a lot more work for you and therefore, run slower, but it is what it is. It'll be great for learners. Anyway, so they have very strict requirements for safety, because you're going to run code off the web and you have no idea what it is and you don't want it to blow up your machine.

A bunch of robustness features that will make it, we hope, possible to write an interpreter for Web GPU that is absolutely impossible to crash from the outside. We have that. We have a bunch of stuff related to getting compute parity with OpenCL, which is a lovely higher level computing API for GPUs primarily. It's not absolutely limited to GPUs. But you can also do compute in Vulkan, but it's not as nice, and it's not as orthogonal and regular and clean. There are some safety features that aren't present in Vulkan that are present in OpenCL. We're trying to have some uniformity there. Things like, 64-bit addressing. You don't necessarily have it in Vulkan, but we'll have in roadmap 2026, we'll be basically saying, all of the interesting, highly programmable devices that sport lively open software markets are going to have 64-bit addressing in the GPU. These things are coming.

The ability to cast pointers. Vulkan doesn't have it. Some state management improvements. State is the bane. Okay, now we're going to get too far into stuff you don't want to hear about, but managing GPUs have enormous amounts of state. You typically set up all this state and it defines a virtual machine. Then you shove data through it with a shovel, as fast as you can shovel, and it all just works. Managing that state is a nightmare, because you want to change it and start shoveling more data. But the old data isn't finished running through.

Anyway, so we have state stuff in mind. ML stuff, as Ralph said. Obviously, okay, this goes to a meta point. What is Vulkan's job? In our view, we have this discussion with our board of directors who persist in thinking of Vulkan as a graphics API. In our view, the mission of Vulkan is to do whatever people want to do with a GPU. GPUs, for example, on your desktop graphics card has a video decoder in it. They all do. Exposing video is part of Vulkan's job, and we do that.

People use GPUs for machine learning. It's the dominant platform for machine learning. Therefore, it's in our wheelhouse to expose machine learning on GPUs. Anything people want to do with a GPU, we want to provide what you need to do it. I think Ralph, you did cover the debug, for example. Ralph was one of the leaders of getting debug functionality.

[0:33:01] JN: Yeah. I think you mentioned that you worked on some of the extensions prior to [inaudible 0:33:04].

[0:33:06] RP: Yeah. One of my first initiatives in arriving in the Vulkan Working Group as Samsung's representative was, this is not unique to Vulkan, but debugging, what happened to your GPU when it crashed is a really thorny, painful problem for developers. Yeah. When I arrived, one of my first initiatives was the working group should really do something about this problem. It's not an easy problem to solve. We spent at least two years discussing exactly what we could do there.

[0:33:44] JN: The classic maneuver of joining the committee to solve your own problem. I like it a lot. Perfect.

[0:33:48] RP: That was essentially what went on that.

[0:33:51] TO: We all do that. We all do that.

[0:33:53] RP: I think, I'll also add one caveat to all of this discussion about roadmaps, which is to say, we're talking about the future here. Historically, we have been a little bit risk averse about saying we're going to do a thing. Then essentially, we have historically only wanted to say, we're going to do a thing when we absolutely knew it was done and nothing could possibly go wrong. Roadmaps are new ground for us. Speaking about what's on roadmaps that have not been announced is definitely new ground for us.

There is a world in which companies start working on these things that we've said are on our roadmap, and somebody discovers there's a problem. Collaboratively, if we're asserting that everybody will support a thing, sometimes that means we need to figure things out. This is where we're trying to go. The things that are not in a published document. There is room for them to move around in time based on problems that people run into. Talking about the future is difficult.

[0:34:56] JN: Yeah. Yeah, that totally makes sense. Thank you. Yeah, thank you very much. That's amazing. Yeah, great view of the content. Also, to do that, you mentioned your SIGGRAPH talk, Tom, which you said something in that talk and I wanted to chat about, because I thought it was really interesting and it hits on another thing you just said about what is the role of Vulkan. To paraphrase, I think you said something like, it takes an ecosystem to raise an API and you were talking about the ecosystem around Vulkan as a whole. The really interesting thing you said was that although the working group doesn't have authority to dictate how the ecosystem develops, it does have responsibility to ensure that it works, which seems like a very difficult hill to stand on and [inaudible 0:35:29]. Can you talk a little bit about this and how it influences your work?

[0:35:33] TO: Sure. Well, I mean, this was a realization we came to slowly, because we created Vulkan 1.0 back in 2016, and people desperately wanted to use it, and we came out and said, "Here it is. We finally got it done." We gave it to them and they were like, "Well, now what do I do? I don't know how to learn this. The API is enormously complex. I don't have any tools that I can use. There are bugs in the implementations. Thank you, but it's not solving my problem." It was a gradual process for us to understand that we have to define our job broadly, as if the job of the Vulkan Working Group is to create Vulkan and also make sure it's successful. We have to own all the problems that somebody else isn't owning for us.

I mean, it's a tiny organization. We have a budget of about a million and a quarter per year, half of which we spend on conformance testing. Compared to some other standards bodies, we're tiny, compared to, the way I like to say it, maybe I said this at SIGGRAPH, we are approximately one-third the size of the average McDonald's in terms of our annual budget.

[0:36:47] JN: I don't recall that, but that's fantastic.

[0:36:49] TO: Okay. In terms of our annual cash flow. But we have, fortunately, a lot of - well, I will say, we are leveraging efforts of many people outside. Valve is wonderful about funding a lot of work in the ecosystem that Khronos doesn't pay for. The total value going into the Vulkan ecosystem is many times with the working group's budget is, but still, we're small. Any time a developer is finding Vulkan not usable for some reason, even if we can't solve it ourselves, we feel a responsibility to listen to them seriously, understand the problem, give them the best answer we can, and hopefully, find or motivate a solution from some other part of the ecosystem if we can't do it ourselves.

You asked, how does this affect your work day-to-day? We do a lot of tracking. Every time we have a face-to-face, which is three times a year, and one of them is virtual these days. One of the things we always do is go through survey, try to find every piece of feedback we can find from the developer community, survey our members, survey our advisory panel, we have an advisory panel, and all of our GPU vendor members have developer relations teams that are constantly talking to developers and trying to help them use Vulkan on their implementation, but they hear things, and they hear what's not working. Job one, we just keep on top of it.

Job two, if it's a problem, it becomes an issue in our issue tracker, and it comes up on the agenda, and Lucky Ralph gets to. A lot of the chair's job, I will say, is rubbing the group's nose in problems that aren't progressing. I've been doing that for a long time, and Ralph is going to do it going forward.

[0:38:37] JN: Got to keep things moving. Also, you mentioned was in that talk, and in your summary of roadmap 26 was the OpenCL feature parity. You mentioned that Vulkan does offer compute. Obviously, that's an enormous topic at the moment. Can we talk a little bit about what the facilities of Vulkan office are for compute and GPU? Tom, do you want to kick us off on that?

[0:38:58] TO: Well, I'm old enough. I always start with history. Compute came into GPUs on the desktop back with, I think, DX 10 compute shaders, and OpenGL 3, was it 4 point? I can't remember what OpenGL version introduced compute shaders, but it's been around for a long time. There's been a compute model. It's GPU-flavored in that GPUs are quirky and thorny, so special memory, spaces, compute can only happen here. It can't interact with other things. But the shading languages are general purpose. They have a full population of float and integer types. In modern Vulkan, let's say, Vulkan with the extensions that bring it up to 1.3 and beyond, you have the ability to do something which is like having pointers. It's not quite exactly the same thing, but you can do fully general computing. On desktop hardware, you can do double precision. We have that.

We have slowly and painfully worked ourselves to where we think the behavior of floating-point numbers is fully specified. There used to be a lot of quirks like, do you get not a number when you divide by zero, or do you get zero, or do you get - there was a lot of latitude in early Vulkan, and we've slowly nailed that down. You may have to enable certain extensions. What do we call it? If, for example, you decide, I really don't want round to nearest, I really want truncation. We have an extension shader float controls that will allow you - give the hooks you need in the language to turn on and off different kinds of floating-point behavior. Ralph, do you have any thoughts?

[0:40:42] RP: I mean, I think I would take it up a higher level and say, there are compute APIs, things like OpenCL and things like CUDA that provide you very precise - first of all, they tend to be a more general programming to see programming models. There's things like pointers in there. They also provide you very precise guarantees about things like, what precision were my floating-point operations? Give me exactly how much error can I have in a square root extension, in a square root extraction, that sort of thing. These are the sorts of things that you need if you're doing, for example, scientific computing. If you're doing a complex physics simulation, you need to know how is your floating-point math going to behave.

Graphics historically has been very forgiving of being slightly more lax about that, because we're dealing with colors and perception and pixels. Historically, the graphics answer to how precise does a square root have to be was very different from the compute API answer to that same question. Once you start doing the same compute problems on Vulkan, then a lot of the same considerations come in and we start having to nail those things down, but also, nail them down in a way where if you're right to compute app and it's critical, you're maybe willing to pay those costs, but we can't make all of graphics slower as a consequence. Those are the sorts of tradeoffs. That's the high-level take on where there's a difference is they've come from different places and now the use cases are converging. So, some of those things have to come from the compute side. There's more things that have to be nailed down.

[0:42:46] JN: You preempted my next question, which is going to be where's this fit alongside OpenCL and CUDA. That's awesome. I guess, to round off this section, you mentioned earlier the programming language for Vulkan and then general published shaders. This ties into, I guess, nicely the news this month that Microsoft will be supporting SPIR-V, which I believe is the language you're referring to for HLSL, their shader language. Could you talk a little bit about what SPIR-V is and how it's rolling in Vulkan?

[0:43:11] RP: Again, I guess I'll refer back to history. In OpenGL, we took in graphics shaders as a shading language, as human readable source code. Everybody had a compiler that parsed that source code and translated it down to the native instructions of their GPU. It was built into the API that there would be a function call that you provided the source to and it would do the compilation. There are a couple of consequences to that. One is that your API only consumes one source language and people either have to code in that source language, or they have to have something that generates that source language.

A further complication is that compilers are complicated, compilers have bugs in them, different vendors, compilers have different bugs in them and that was a painful experience. Putting my former compiler engineer hat on, the typical process for compilers is they're taking a source language and they're translating it into some intermediate representation of the language, something the compiler understands that is not human readable, but still contains the structure of the code, and then they translate that down to the actual individual instructions, the hardware level instructions.

What SPIR-V is, is that we essentially said, we would standardize the intermediate representation. We would standardize a format that says, this is a representation of your program. It's not designed to be human readable. It's a binary representation, but that allows a multitude of front-end languages and front-end compilers to generate those intermediate representations. It gets drivers out of the business of parsing text and it lets drivers engineer, just concentrate on the problem of, how do I get from an intermediate representation of this problem to my instruction set. It's been a very powerful thing. It's got us to a place where application developers can write their shaders in HLSL if they're coming from the DX world. They can still write them in GLSL if they're coming from that world.

There are other compilers out there as well that also generate SPIR-V. In that sense, it's been a very powerful choice. I would say, that is one of the early Vulkan 1.0 decisions that we made that was absolutely right.

[0:45:51] JN: Awesome. Cool. Thank you for running through that. That's covered all of the Vulkan-specific topics I wanted to chat about today, and I'm conscious that we're running low on time. I do want to follow up on, I think, towards the beginning of this podcast, we made a couple of jokes about committee work and people who enjoy and doing it as a career. Folks who have heard this and they're like, "You know, actually, working on a committee for an open-standard sounds like it's for me." Do you have tips for how you would get involved? Tom, let's start with you.

[0:46:18] TO: A lot of it comes down to picking your employer carefully. As I said, most of what that is most of us in the group work either for a GPU company, which supports Vulkan, or a company which makes use of Vulkan in some fashion. I did leave out, by the way, Google is a member, because Android depends on Vulkan, so they put a lot of effort into it. Either you need to work for a company that needs Vulkan to exist for some reason, either because they want to sell it, or because they want to buy it and use it. Then you work your way into it.

I mean, another thing to say about Vulkan, which maybe we haven't touched on, is that we're heavily committed to open source. The specification is open source. All the tooling is open source. All the compilers that we use and the validation layers and all that stuff. We've had enormous benefit from people who read, comment on. Proposing a new feature through the open-source interface is a tough sell, because if you're coming from outside and you don't work for a GPU vendor, there are tons of constraints that you're not aware of, and your chances of producing something that will actually work in hardware are near zero. I wouldn't encourage people to just come in and try to add features. But if you start working with the spec, understand the spec through looking at things that go by, you'll know enough to make a contribution. Your contributions, and by the way, would be desperately grateful for we always are bug reports, etc. Not in other people's drivers, but in the spec itself.

It really comes down to it's difficult to contribute. Actually, let me back up. There are other places that that were very interested in having help with, which are not the spec. For example, part of our DevRel operation, we have a large and growing collection of sample codes. Ralph, do you know? Can you join that group without being working for it?

[0:48:25] RP: No. Because of NDA.

[0:48:27] TO: Because that develops examples for unpublished extensions. That's why it's NDA.

[0:48:32] JN: That makes total sense. Getting to a company that's part of a member, I guess is the starting point. Ralph, anything to add?

[0:48:38] RP: I mean, I think Tom's point is largely correct. The short answer is the most likely routine is to work for a company that is a member. Or if your company is small, but in the right space for your company to join and that gets you a seat at the table.

[0:48:57] JN: Do your own LunarG.

[0:48:58] RP: Yeah. Well, whether you do your own - Definitely, there are, if you're a GPU contractor type company, there's space for those. We have game company members. In the grand scheme of things, the cost of joining Khronos is a lot less than the cost of your engineering time. That is probably the routine. If I had to say, how have most members who are regular participants of the working group got there? The most traditional route is become a driver engineer at a hardware company and volunteer to do this stuff. You have to have a certain mindset to find standards work engaging.

I love it. There are other driver engineers in my team who find it a lot of meticulous paperwork, and they would rather write code. It's something that you either learn to love, or you learn that you want to do something else. The traditional routine is probably through driver teams in hardware companies. As Tom has mentioned, we have other members as well. There are game companies, there are platform vendors, there are people like LunarG and Mobica, and LIRA, who are software contractors. There are people from some of the open-source projects, albeit sponsored by companies working in that space. Yeah, there's a variety of routes in.

[0:50:20] TO: I should mention, I said, you pay several tens of thousand dollars for a company to become a Khronos member. We really do want the participation of small companies, small game developers, etc. There is what's called an associate membership, which is in rather than tens of thousands, it's thousands. It scales with company size counted as number of employees. Those members don't get a vote in the committee, but they can do everything else. They get to participate. They can make non-NDA proposals for changes, etc. We do that.

[0:50:58] JN: Wonderful. Awesome. Cool. Thank you both so much. This has been illuminating for me, and it's great to hear how everything works under the hood. I do believe that's leading to say, Tom, you mentioned at the beginning that you've got an upcoming retirement. Thank you so much for all your years of service to Vulkan. As someone very downstream, as a big enjoyer of video games, I've enjoyed the fruits of your labors for many years. Thank you very much. Congratulations on your election, Ralph, and good luck for the future.

[0:51:23] RP: Thank you.

[0:51:24] TO: Thanks.

[END]