EPISODE 1825 [INTRODUCTION] [0:00:01] ANNOUNCER: Grand Theft Auto 3 is a 2001 open-world action-adventure game developed by Rockstar Games, and it had a profound impact on both gaming and popular culture. Its success cemented video games as a dominant form of entertainment and storytelling, and paved the way for future blockbuster franchises. The game was also a technological milestone that redefined what was possible in open-world game design. It was one of the first 3D open-world games to offer seamless exploration, blending mission-based gameplay with a living, breathing city. The game was originally released on PlayStation 2 and PC, but never had an official Sega Dreamcast version. However, the homebrew community embarked on the goal of porting the game to the Dreamcast and recently released the port to much acclaim. Falco Girgis and Stef Kornilios Mitsis Poiitidis are developers on the GTA 3 Dreamcast port. They join the podcast to talk about the Dreamcast hardware and the heroic task of porting GTA 3 to the console. Kevin Ball, or Kball, is the Vice President of Engineering at Mento and an independent coach for engineers and engineering leaders. He co-founded and served as CTO for two companies, founded the San Diego JavaScript Meetup, and organizes the AI in Action discussion group through Latent Space. Check out the show notes to follow Kball on Twitter or LinkedIn, or visit his website, kball.llc. [INTERVIEW] [0:01:36] KB: Hey guys, welcome to the show. [0:01:39] FG: Hey, thanks for having us. [0:01:41] SKMP: Thanks for having us. [0:01:42] KB: Yeah, super excited to dig into this. Let's maybe start with each of you giving a little bit about your backgrounds and how you got into the sort of GTA 3 and Dreamcast world. [0:01:54] SKMP: I'm traditionally a Dreamcast emulator developer. That means I have implemented the Dreamcast in software. That started as a childhood project. Actually, it's been ongoing for 20 years, this involvement with on and off periods. And after I kind of withdrew from that scene because I decided that other people are now on the forefront, I don't want to do the emulation aspect so much. I was like, "Okay, what can I do to make something run on the Dreamcast this time?" Play the other role. And that's how this happened for me. [0:02:27] KB: Awesome. What about you, Falco? [0:02:28] FG: All right, so I'll start off saying I actually taught myself C and C++ at age 14 because I specifically wanted to make Dreamcast games. When I was a kid, this was before there was like an iPhone market. This was before Xbox Live Arcade. There was no way for an indie developer to target a console ever except for this Dreamcast thing which was just recently discontinued, right? And I found this scene of developers like Stef, like so many others who are doing really cool stuff in this Dreamcast community. And I was 14 and I made a forum post, "Hey man, how can I make Dreamcast games? Do I use Perl or whatever?" And they're like, "God, no. Use C and C++." So I used to walked the library to get books on it. And like fast forward way too many years and now I helped to maintain the KallistiOS where I used to use that SDK when I was a kid to do Dreamcast development. And for me, Stef, I knew about him and what he did and his emulator work. And every time he was in Discord, I knew, "Wow, this guy is really, really impressive and knows pretty much everything ever about the Dreamcast." And then he started working on a GTA 3 port, and I'm just watching from the background and everyone's doubting this guy. And I knew that if anyone could ever pull this off, it would be him. And then I saw some other developers who were just, "Man, A-tier Rockstar, Dreamcast people coming." And I just knew, "Oh my God, I have to be a part of this." Even if it fails, what a privilege to be a part of something like that. [0:03:59] KB: Nice. Let's maybe talk a little bit about GTA 3 and what makes it such an undertaking to port it to Dreamcast? [0:04:09] SKMP: Yeah, there are various aspects. First of all, we are starting from the PC version, that's the reverse-engineered one. The requirements are substantially higher, especially on memory. And also, the models are a little bit more complex, but it's mostly memory. Audio memory, the sound effects don't fit. Texture memory, the textures don't fit. System memory, the game loads too many models in it. [0:04:33] FG: You want to tell him exactly how many megabytes we're working with? [0:04:37] SKMP: We have 16 megabytes of main memory for everything and then eight megabytes for video, but that's also for the transform models. It's like, at the end of it, we have like two, two and a half megabytes free for textures and a little bit less than two megabytes for audio. That's like very small specs compared to even then PC standards. [0:04:57] KB: Yeah, I think those of us who are in the modern world now, that's like mind-bogglingly small. But I'm kind of curious, if you look at, "Okay, what was this originally shipped at? How much of a downsize are we looking at here?" [0:05:09] SKMP: The PS2 version, that was the most compact version. It sits for 32 megabytes of RAM. And the similar audio and video RAM size is like - the Dreamcast interface are not directly comparable but there are similarities, but twice the RAM that what we have. We have to compress things further. [0:05:27] FG: Yeah, PS2 had 32 meg system RAM and four megabytes of video memory, but that gets complicated. Because Dreamcast, you have to store other things in video memory. I'm sure that will be something Stef talks about where it's not, "Oh, you have eight megs of video memory, PS2 had four, why is that a problem?" But yeah, I'm sure we'll get into that. [0:05:48] KB: Okay, so memory is a big domain. What other aspects of Dreamcast were different enough that it involved a substantial porting effort? [0:05:57] FG: I'll say one thing that I am working on that, oh man, I'm still working on it because I didn't want to screw up the Alpha release by pushing for it without too much testing. Because if you mess this up, it's really bad for the stability of the game. It's physics engine stuff. We can at least see in our code base where they were using a vector unit in the PlayStation 2, a coprocessor that is a little bit like a DSP and how you'd program for it. And they were accelerating a lot of like the collision checks, stuff like that. We don't have that on the Dreamcast, no vector coprocessor. But we do have some pretty cool instructions like a very fast sine, cosine, approximation, inner product, vector dot products, two 4D vectors and one instruction, things like that to where you could start accelerating some of that math. But the only problem is like I was saying, those are approximations. And when the approximations are two approximate in like a physics engine that things go very poorly. But luckily, actually, this works. It seems to be a very stable physics engine. Unless I do something just really stupid, like forget to copy a sine because some inverse square root - there's a really funny approximation trick. A floating-point division on the Dreamcast, Stef, how many cycles? [0:07:15] SKMP: I think it's 20-something. [0:07:17] FG: I have to ask him, because I don't know how many. It's quicker to use. Are you familiar with the Quake 3 inverse square root approximation? [0:07:26] KB: Why don't we spell it out a little bit, because not everybody will be. [0:07:28] FG: Okay. Yeah, great. And Stef, correct me if I'm wrong. I'm not exactly 100% sure. But the folklore is that this really crazy integer bit shifting, I don't even understand how someone could figure this out, approximation for an inverse square root, it's very fast. And taking a real inverse square root, or even a square root, is extremely slow. They figured that out. And that's very fundamental for doing things like normalizing a vector that you can multiply by the inverse square root. And the cool thing is, on our FPU on the Dreamcast, that's one instruction. FSRRA. What does it stand for Stef? [0:08:06] SKMP: I actually have no idea. [0:08:09] FG: Something reciprocal square root. Anyway, it's faster for us. A division is like painfully slow. It's faster for us if the sine does not matter to multiply a number by one over the square root of itself times itself, if that makes any sense. [0:08:24] KB: That's kind of hilarious. [0:08:26] FG: Right. I'm sitting there in the physics like, "Hey, maybe I can get rid of some of these divisions." And luckily, it looks like so far it needs to be soaked a lot more. But it looks like as long as you don't actually need the sine because you're losing it with the squaring and square root, it looks like it's pretty good. [0:08:45] KB: That I think the level of optimization that we're talking here, it's kind of hilarious, but it's endemic to building for old hardware, right? [0:08:53] FG: Right. [0:08:53] KB: You're trying to take something that could not do this and squeezing every extra erg out of it. [0:08:57] FG: Yes. Just wait until Stef talks about what he did for the T&L, the transform and lighting. I think it's one of the most glorious things I have ever - ever I wanted to see code that was like, "Yeah, this is why I'm on this team," it's what he did there. It was just really cool stuff. [0:09:14] KB: Well, so let's maybe start at a bit of a high level. When you're doing these types of transformations, where are you starting? Do you have original source code or do you have just a binary and you're intercepting there? What does this process look like to get to, "Oh, I have a bit of a game to run, but it's stuttering here, so I need to intervene." What is the flow of development? [0:09:35] SKMP: We had the full reverse engineering source code for the game by the ARI3 project. And also, we had the fully reverse engineering code of the game engine that they used, the librw project. Without these two projects, it would have been a totally different scope. The Dreamcast port is a small addition to the previous work done. After finding those projects and getting everything to compile together, the rest of it was pretty straightforward to get the menus. And then it was just winning the new renderer for the game engine because it doesn't support. That's one of the other major chunks that we had to write, the renderer, because the game engine, of course, it only supports OpenGL, Direct 3D, and it has some bits for Xbox and PS2, but only some bits. And we had to bring in support for all of the custom texture formats, the graphics card support for the Dreamcast, to implement the pipeline tools, to convert, to repack everything to the Dreamcast optimal formats. Then actually, go ahead and write the transform and lighting pipelines and submit the vertexes to the graphics card. That was the major part of the work. [0:10:41] FG: You know what's really cool is this driver-level graphics work he's talking about is actually separate from the GTA 3 code base or the reverse engineer re-GTA 3 codebase. It's actually part of something called RenderWare that was a very famous middleware that was used in that era. We actually made our own RenderWare back-end for the Dreamcast. That's kind of cool. [0:11:05] KB: Yeah, actually, let's talk about that. So you're building these pieces that are specifically to adapt it to the Dreamcast. Are those reusable for other games? [0:11:12] SKMP: In theory, yes. We don't have any other projects. I mean, there is one other project using the same engine, and it's the GTA 3 reverse engineering for - is it Vice City? What's the name? [0:11:23] KB: Yeah, Vice City. [0:11:24] SKMP: the Miami brand Vice City. So that's another project that uses the same rendering library with some changes, but similar enough. And there are a ton of other games that use this library, especially back in the Dreamcast stage. It was a very dominating library for PS2, Xbox, and GameCube. There is hope that more games will be the versions need and then use it. And also, ideally, we would contribute back upstream. So independent developers could also use librw to develop their own applications, right? Their own games. [0:11:56] FG: That was one of my motivations for wanting to be part of this team, too, is like, "Okay, we have this team of alpha Dreamcast developers doing this crazy lighting. And we're pushing a lot of polygons. And we've implemented this middleware layer." And can we help democratize the upper echelons of the Dreamcast hardware for the rest of the community with some of this work that we've done for GTA 3? And I think the answer is yes. I think this could benefit the Dreamcast indie scene as a whole. [0:12:26] KB: Let's maybe dive in a little bit to these different pieces. And tell me if I'm wrong here, but I think what I'm hearing is, one, there are big chunks that were missing for Dreamcast that you're able to sort of implement as a package. And those you can plug in. You're architecting in a particular way. And then there's other places where you're like, "Okay This is maybe part of the game code or doing something, but it's too slow or it's taking too much memory." [0:12:48] FG: We are both doing it cleanly implementing on a driver boundary. And then on the other hand, we're going in with the sledgehammer. And in GameCode too, yes. [0:13:00] KB: Yea. Let's maybe talk about, for those driver pieces, what does the architecture look like? What were some of the interesting sort of technical challenges to get this to work? And maybe for someone like you, Stef, who's deeply familiar with what the Gamecast is, it's just obvious. But for those of us who aren't, what does that whole piece look like? And then we can come back and look at those places where you're taking in the sledgehammer or the scalpel to tweak things? [0:13:25] SKMP: Well, for the Dreamcast, for the librw, it actually follows a typical pattern similar to, let's say, a modern game engine would have like Unity. It has a scene graph of transformations that have child objects. I mean, they call them atomics. They don't call them objects. So at the lower level, you have to provide basic initialization functions like cure the driver for the screen size, and things like that. And also, it has three rendering paths, one for immediate mode 2D graphics, one for immediate mode 3D graphics, and then one for the atomic rendering. And you slowly have to go through and implement those parts. And to get into slightly more detail, an atomic is made up of a mesh. And mesh has an index buffer which is the index is off the geometry and a vertex buffer. You look up the index and then you fetch the vertex and then you transform the vertex and you send it to the graphics cars. That's the basic flow. And you do this for every mesh for every atomic that the game engine asks you to do. The game engine itself is pretty straightforward, to be honest. You register the renderer as if like game to the game engine and it will call your callbacks. As long as you register all of the correct plugins, then you will get all of the callbacks. And if you implement them correctly, you will get your scene rendered correctly, right? For Dreamcast, there are some major problems. The Dreamcast does not render all of the types of geometry in the order you submit them and you have to sort the geometry into opaque, transparent, and alpha-tested. And this creates some problems with the ordering of geometry because the game expects things to be rendered in the way it sends them and then we have to split them into lists. [0:15:10] FG: And that's universal for any port to Dreamcast. It's fundamental to our architecture. We sort our geometry by lists that are opaque, translucent. And there's a third list, the punch-through. [0:15:21] SKMP: Punch-through. [0:15:22] FG: Yeah. [0:15:22] KB: Got it. Okay. So where does that - you said Most of the game engines aren't going to expect that. That's a Dreamcast limitation. At what layer does that transformation happen? And is it transparent to the game engine, or do you have to go in and then start doing those interjections later? [0:15:38] FG: Well, this is where Stef, at first, was doing something cool, where we were deferring everything, right? If the Dreamcast needs it in a certain order, then you have to capture the pipeline state of what you're trying to draw, the vertices that you can defer it until it needs to be drawn. And our first task, Stef had some stateful C++11 lambdas that were just like capturing the whole pipeline state. It was like it wasn't really what you would want for high performance code, but it was like, "Man, that is a really useful use of a stateful lambda right there." And then he placed them into standard vectors, and then he could iterate over them later when he needed to. [0:16:17] SKMP: It's also mostly transparent to the game engine. We did have to modify the game a little bit, like in the places where it expects certain depth ordering. For example, the fog, it rendered it at a predefined depth and it expected this to be the minimum depth. But it did this in the wrong order of things, so we had to help it there. But for the most part, it works out of the books, the reordering. [0:16:42] KB: Okay, cool. And so to make sure I understand it, you sort of implement kind of a middleware that's capturing these things as they're coming in. Then you sort them in the way that the Dreamcast expects and send them on the way. Do you have a size of buffer, or is it over a per frame, or how does that - [0:16:58] SKMP: It's just a vector right now. [0:16:59] FG: It's dynamic at first, but he is doing a clear, but he's never doing a shrink to fit on the vector. Once it's kind of stabled out, it's basically like a static. [0:17:09] KB: Nice. Okay, cool. Any other of those sort of big driver layer? [0:17:16] SKMP: Well, on the driver layer, like Falco mentioned, we had this capturing lambda stuff, and it was wonderful from a language perspective. [0:17:23] FG: Yeah, right. It was the ultimate modern - we got a lot of stuff. Man, why do you need - when I joined the team, I was the guy who went in to the make file and was like, "Whoops, C++ 23." And it caused so many problems because our host compilers couldn't support that for the tool side, you know? I had to back it down to C++ 20. Like, "Okay." And people asked, "What good is that going to do you for this project?" And that was a, "Well, right there." It was pretty nice. [0:17:50] SKMP: The thing is that I unfortunately had to move away from that to avoid copies, because we spend a lot of time copying the lambdas around and do it to several intricacies of the lambdas. You can't have zero copies when you could slack them into a vector. Somehow it uses content. There's a concept of contexts. For every atomic, we create an atomic context and we attach some other context like a material effects context. See if it has reflections, things like that. And then we capture only the context number, which is enough for us to recover the information. But from a functionality perspective, that's how the driver works. The other part is how we actually submit the geometry to the graphics card. That is interesting. Because like the normal way is that you index the buffer, you fetch the vertex and then you transform the vertex and then you send it to the graphics card. Then there is a smarter way that you can transform all of your vertexes. Usually, you have more indexes and vertexes, that's why you index the buffers. The smarter way is you transform all of your vertexes at first, and then you de-index them, and then you don't have to transfer them twice if they appear more than once. And then we did this approach where we sliced our buffers in 128 vertices, which was tricky because we had to regenerate the topology of the mesh a little bit, to cut it into chunks in a way that doesn't generate too many more vertexes, too many duplicates. It's a trade-off everywhere. And then we handle this at 128 vertexes, which actually can fit in the cache. We freeze half of the cache with some tricks, then we transform into the cache. And then instead of doing a memory copy for the index, we directly send the cache to the graphics card, send the cache line. [0:19:38] FG: Yeah. I think that was the most mind-blowing thing. We have a mode where he split the cache in half, and then we can manually control what goes into the second half of cache. He pre-allocates this above where the GPU expects the vertices to go, and he does all the T&L there. And then to send it to the GPU, cacheless. It blew my mind. [0:20:02] KB: Yeah, and you just blew my mind. If I'm understanding, you're essentially splitting the cache into two logical sections. [0:20:10] SKMP: Mm-hmm. [0:20:10] KB: And using one of those to essentially prepare all of your data, which you can then just flush straight to the GPU. [0:20:15] SKMP: Yeah, yeah. If you order what you want. [0:20:16] FG: Yes, isn't that crazy? I've been doing green cast stuff for a long time and that was pretty crazy to me. [0:20:24] KB: Yeah, this is getting down to a level most of us never touch in terms of control over your hardware. [0:20:29] FG: Right. [0:20:30] KB: One of the things you said a little bit ago, Falco, made me want to kind of dig in. What does the build chain tooling environment look like building for this? [0:20:41] FG: Okay. That's where I come from is I work on the - it's called KallistiOS. The Dreamcast community is kind of unique in that no one uses the commercial stuff in our community. Because, not to brag, but our stuff is pretty good. Our open-source stuff, it's been around since 2001, right? It was around back when Sega was still doing Dreamcast stuff. So it's had a lot of time to mature. There's a lot of stuff in there, like supporting IPv6, supporting different mods and things for the Dreamcasts, the hardware mods. My Dreamcast actually has 32 megs of RAM. So I don't know about this GTA 3 out-of-memory stuff. I'm just kidding. But yeah, we support a lot of hardware modifications that have come around for all these years. And one thing that we're pretty passionate about is keeping our tool chain up to date. The super H4 is our architecture and it's still maintained in GCC, believe it or not, because it's used in Casio calculators and it used to be used in set-top boxes and routers. So it never really went away from the GCC tool chain. We actually like, "Oh, GCC 14.2.0 came out the other day." I think it's 2.1 now. We had a midnight launch. Like, "Oh, hey, you can already use this some preview like C++ 26 features on the Dreamcast. You wanna do that?" Yeah. We nerd out pretty hard about our tool chain stuff and our tool chain support. And the crazy thing is we're not the only ones either. The Nintendo 64 community is pretty much on the tip, the PlayStation portable community, some guys I know on the Saturn scene. I don't know what it is. When we're with retro hardware, we like to not be retro with the software in. [0:22:15] KB: That's awesome. Well, and I think Stef, you mentioned you started out kind of on the emulation side. Are you able to run all of your test suite, whatever you're doing with this on an emulated piece of hardware or how - [0:22:27] FG: How about all of it? [0:22:28] SKMP: There's several layers of inception in this. But, yes, I maintain a fork of my old emulator. It's no longer the best emulator for Dreamcast, but it's one I'm very confident within the codebase. I maintain tasks for this project. Every feature we use, I add it to the emulator. Like these cast tricks, I added them into the emulator. Also, we have added a lot of validation. Every time we have found a mistake, how we handle the hardware, I tried to add the check-in into the emulator. So that next time we do the same mistake again, it will warn us about it. [0:23:02] FG: And it was extremely useful. There were times we were stuck, and it's not so easy debugging on the Dreamcasts that, yeah, a lot of people - we have a lot of people were like, "Oh, I print fDbug exclusively." We have AGDB stuff, but it's not the best. It's lacking a lot of features. Stef's inception of his virtual Dreamcast really saved our butts many times. [0:23:22] SKMP: It helps. And we can also cross-compile for PC. I mean, the code base originally worked on PC. We did the splice thing together, like take some parts of the emulator and then splice it with the rest of GTA. We can run into this hybrid mode where, actually, the CPU is running native, but the graphics is running through the emulation. This means we can use tools like Valgrind or Address Sanitizer. A couple of bugs have been found this way. The memory, we had the memory leak of a matrix. There's no way we would have found where it came from without Address Sanitizer. There's this tooling that has been developed for the project around the Dreamcast simulation that I'm familiar with. And this has really helped us. Yes. [0:24:05] FG: I think his emulation background has helped in every - I have started from indie. I haven't seen this big picture he's seen where he has had to emulate all of the AAA titles. So he knows all their tricks. It's been fantastic just learning from him from that perspective. [0:24:22] KB: Well, and one of the things that stands out to me here is the hard work works, right? Things where it doesn't actually work as advertised. Or I don't think I would have ever thought about going into the abstraction layer of the cache and splitting that out and taking separate control of different pieces, right? I guess that would be an interesting thread to go down is like what unexpected hardware quirks are there within the Dreamcast? And how do you end up having to work around those? How does debugging those work in terms of the emulator or things like that? [0:24:50] SKMP: Well, hardware quirks, to be honest, most of them are handled at the KallistiOS level. There are some known hardware bugs. They are handled in the initialization layer and everything kind of works out of the box. The one we run into is that - the Dreamcast, apart from the cache, it has these two buffers called store queues where you can collect some data and then write it all together as 32 bytes into the memory, the graphics card, wherever you want to write 32 bytes. [0:25:20] FG: It's like an ultra-fast 32-byte mem copy or mem set if you use it like that. They alternate. And you can write into it and flush one. While one's flushing, you write into the other one and they alternate. It's a really cool thing. [0:25:34] SKMP: But you cannot read from it. [0:25:37] FG: So I learned. [0:25:40] KB: It sounds like something you learned through pain, perhaps? [0:25:42] SKMP: And you also can only write 32 or 64 bits at them, four or eight bytes. You cannot write two bytes or one byte, for example. And these are quirks we have in the emulator, we have them assert. If you use them, it goes and says, "Hey, you're doing something that you shouldn't have been doing." Another example is null pointers. Null pointer is usually a zero pointer, and that is a pointer to the BIOS in the Dreamcast. You have a null pointer read. You're reading some data from the BIOS. Nobody minds you doing that. And we also have warnings for both reads and writes for that on the emulator just to catch this edge case. We could actually do this in KOS. We could install an alt pointer page, but we don't. [0:26:30] KB: All right. So I think we've talked a lot about kind of the systems and the systems we had to do as wholesale replacement. Let's maybe now get back into where we're having to go in and bring out the sledgehammer or the scalpel, make optimizations either in original game code. Or heard a lot about transforming the formats of your textures and other things. So what are the things that you're doing now that are no longer cleanly divided by, "Okay, here's the subsystem," but that you're still having to dive in to get this thing to actually run on the Dreamcast? [0:27:00] SKMP: Well, for the texture conversion tools and the repack process, I can give some detail. I think most of the other optimization, Falco has worked more on the actual game code than me, to be honest. But for the repack process, we copy-pasted the game loaders and made a tool that loads the textures. And then we're using some community tools. I don't remember who wrote PVRTex or what - [0:27:28] FG: I think that's TapamN, but I'm not positive. [0:27:29] SKMP: Yes, TapamN is an awesome developer for the Dreamcast. They also tried this cast submission trick that we're using years before we did. I saw that in some forum post. But ours is the first use in a real application. Also, the same thing goes for the audio files, we process them into a custom format. Well, for the audio files and the image formats, I actually used GPT to write the unpacking tools and the repacking tools. And what do you know? It worked on the first try. But then for the audio conversion itself, like compressing the audio, we used some tools that come with KallistiOS. We just modified them to handle our peculiarities in the format and downsampling the audio if we have to. Things like that. At that level, it was mostly plumbing work to get different tools to work together. And then makefile, but you can parallelize that work and have it in a nice experience for the developers. [0:28:29] KB: Got it. And is that then essentially build once ahead of time, it's packaged, you ship it and it's done? Or are there any dynamic aspects to it that need to be handled? [0:28:40] SKMP: In the makefile, you can build a CD image directly from it. If you ask it to build a CD image, it will pick the repacking tools, it will build the game itself. It will build everything it needs. It will link it all together and then it will run the repack tools. Then it will take the output and it will make a CD image for you. [0:28:55] KB: Nice. [0:28:56] SKMP: Falco can talk more about the game code itself, I think? [0:29:00] KB: Okay. [0:29:00] FG: Well, the first thing I did before I even jumped into that stuff was - I don't know if you know or if people know the Dreamcast memory card, the visual memory unit. [0:29:10] KB: No. [0:29:10] FG: In the controller, you see how there's a screen? [0:29:13] KB: I could see it, but our listeners will not be able to see it. Let's describe. [0:29:16] FG: All right. Anyway, we have a very interesting memory card that looks like a Game Boy, and it even has a screen on it. When you put this thing in the controller, a game can actually drive it as an external screen. Born partially of laziness, and like I don't want to mess up the UI, and then partially out of I think this would be cool, the first thing I did is I started displaying debug information, like performance information on this little visual memory card. I didn't have to pollute the UI or have to figure out how it worked or anything like that. That's one of the things that's even ongoing for me is I have a background thread that I spawn. It's like a C++ standard thread that wakes up every - I believe it's 200 milliseconds checks in on the system how much memory do we have left, video memory, and sound memory. It displays it, updates that little screen, goes back to sleep. I've been doing that for a bunch of other stats so that going forward we know where to focus micro-optimizing things. I would say in terms of one of the most interesting things that I had to work on was for this physics stuff, the collision, the math for that was implemented. One of the most important things you can do is you transform a vector by a matrix. That can be for the vertices, for the collision meshes, for the renderable meshes. For anything like that, that's one of the most fundamental operations in the engine that you can do. This was implemented through a C++ overloaded operator that took a matrix that was on a matrix, and it took a vector. I will say this was not inline. One of the problems we had was the way the matrices work on the Dreamcast was you have a background bank that you load, and then you can swap your active bank register so that it's undisturbed. If this overloaded operator was having to reload the matrix to multiply it by one vertex, and then it gets called again, it reloads the matrix again, multiplies it by one vertex, usually you have a loop of could be like 50 verts being multiplied by one matrix. I had to basically break out this or C-ify the C++ pattern. It wasn't to my liking. I want all the nice C++ stuff, but I had to make it basically take an array of vertices and input and output and the number so that I could load the matrix one time and then swap banks go through and process every - multiply every vertex and then swap back. Yes, that was pretty wasteful before that was optimized more for the Dreamcast. But that was more precision thing I would say than a sledgehammer thing because, yes, touching that code is pretty sensitive. [0:32:02] KB: So to make sure that I'm understanding, essentially, there's conceptually a cache. You call it a register, but like a matrix register, kind of a little cache layer that loads the matrix. Previously, what would happen is every time you would go into this code, it would spill your cache, and you'd have to reload it. What you said is, "Hey, let's pull together all the operations we're going to do on this matrix, so we can load the cache once, run through them, not have to keep spilling memory in and out of there." [0:32:28] FG: Yes, yes. Now, I have to keep loading and unloading that matrix into that cache area that you're talking about, yes. [0:32:34] KB: That makes a ton of sense. How did you find that? [0:32:37] FG: Well, to be honest with you, that's just something I've worked on this math abstraction before for our OpenGL driver. I'm kind of used to the load once, multiply while you're in it, and then unload or load the next one pattern. When I saw this pretty little loop of invoking an overloaded operator I was like, "I don't -" Because that's what I wish I could do, right? I'm like, "Man, if they're doing this efficiently, I need to know what the heck is going on here because I can't." So, yes, that was kind of a red flag. [0:33:10] KB: You're like, "Man, I want to be able to do this. How are they doing it to make it work? Oh, it's not working." [0:33:15] FG: Yes, exactly. Exactly. [0:33:17] KB: Got it. Okay, that's cool. That's an interesting example. In that case, I guess you're starting with source codes. So you just go and rework the source codes. You're saying instead of this, we're going to just inline this into this function. Go. [0:33:30] FG: Yes, exactly. [0:33:31] KB: Got it. [0:33:31] FG: They wouldn't want this up streamed if this were still a maintain thing. We would have to make some prettier abstractions or something, but we did what we had to do in that code. [0:33:40] KB: Well, and that's actually kind of one of the places I was wondering is to what extent can you abstract these things? Because I did another interview recently with folks doing - they were more in the game emulation space and taking it and starting from a binary rather than starting from source code. So they had to use different tricks, but they did a lot of like, "Okay, let's swap in. We have this function called. We're going to replace this function and go down a different path." When you're working in these, like we talked about a hack around for division, right? We're not going to do division. We're going to multiply by the inverse square root times itself, and so that we can just do a multiply in an inverse square root because those are fast. Are you able to do that at an abstraction layer where it just applies across all the physics code? Or are you going in and finding all the examples where they're doing a divide and having to update those? [0:34:27] FG: You want to go first, Stef? What would you say? I was going to say there's trade-offs, and especially so with that divide actually now in the code base. I have a C++ template that takes a non-template type parameter for whether it should do an actual divide or a BS divide, right? So if I break something, I can, oh, God, change that to true. Change the default on that one to true, and I can go down. That should not be extra overhead because it should be just a template that's expanding into a couple of - I say should, did I? I have not verified the code [inaudible 0:35:05], so maybe there's an extra move or something. I got to look. But that's an example of when pamphlets can do it. What would you say, Stef? I'm sure there's more run time examples with stuff we couldn't really get away from or that we did. [0:35:20] SKMP: Well, for me, it's also very manual this work. Usually, I go in and manually change things. Put them behind the define or the template, something like that. It really depends if you care about having the code maintainable or not. If you don't really care about maintainability, like we don't really care if the code is a little bit dirtier, it's not going to be worked on further. So then we can just use the simplest approach that works. [0:35:46] KB: Then how are you finding these opportunities? You mentioned a little bit about you've got the DIY profiling tool coming onto the memory card, which I love. In some cases, they're just like you're pattern matching based on things that you've been doing before. But are you doing systematic profiling? Is it gameplay that's driving, "Oh, it's getting sticky here."? How are you finding the places that need optimization? [0:36:11] SKMP: We have a profiler that is passed down from some other developer to me, and I modified it, and now I'm passing it down to the next developer down the line. It looks similar to gprof when I got it. When [SWOT inaudible 0:36:26] gave it to me, it was compatible with gprof, so you could use the standard gprof tools to analyze the stresses. After the modifications I did, it's no longer compatible with gprof, but I nicely asked ChatGPT for a tool that can analyze the reports, and it kindly made me one, as it does sometimes. We have that tool, and it's able to take in a full disassembly of your executable, like with objdump, and then it can annotate for you the lines where it gets hits on the profiler. On hot functions, you can see individual assembly of opcodes and how long they take and things like that. That helps when your operation is concentrated into a function. But for the cases like Falco said, when you have little cuts spread over the code, then the profiler doesn't really help with that. It doesn't really have the - we only sample a thousand times per second. That's 60 frames per second, so that's, I don't know, 20 times per frame. That's not really enough context on what's happening in a frame, like only statistically. So you try to stay still, and you hope that the statistical profile will help you. [0:37:41] KB: When you're running the profile, are you running that in your emulated environment? Or like how much overhead does it take to run the profile? [0:37:46] FG: Oh, I knew that question was coming. Well, for the VMU one, I will say I kept that thing really lean, except for I have a change I've been sitting on where there's a shortcoming in the C++11 threading API where you cannot set the stack size by default. You can't say I want to make a thread with this large of a stack, so we're blowing a few kilobytes that we should not be blowing every time we make standard threads, and the VMU thing is sitting on a standard thread. Other than that, the thing wakes up so infrequently. It's like 0.02% CPU overhead. I keep an eye on that one because I don't want people to say, "Oh, your tool's costing me FPS," or something. But what Stef was doing - what would you say for yours? [0:38:30] SKMP: I think the profiler cuts around five percent of the performance. [0:38:34] FG: Oh, that's nice. [0:38:35] SKMP: It's not terrible, considering everything. [0:38:38] FG: Considering [inaudible 0:38:38]. Yes. [0:38:40] SKMP: It can store a trace for like 10 seconds right now in memory before it writes it to a file. For this performance testing, it only makes sense to do it on the real hardware because on the emulator, the timing is all kinds of skewed up, so you don't really have things like cast simulation. We don't know cast misses. The Dreamcast has a weak cast and a very weak memory system, so it's not optimal if you don't model this. I need to mention that there is also the performance counters that are part of KOS. That can give you hints like how many instructions run, how much time you were waiting for memory. You can use them on any scope or function, and then you can really micro bench smart functions and optimize them by hand. [0:39:23] KB: Awesome. Well, so let's maybe at this point step back a little bit and kind of give an overview of, at this point, so you shipped a first working version. I think I saw somebody running through. I don't have Dreamcast hardware, though I might have to try your emulator as Stef is and run through it. But what would you say the status of the project is, and what are you guys excited about and working on now? [0:39:44] SKMP: Well, the status is that we thought it was fully playable, the Alpha, but there were three bugs in it, it turns out. All of those three are fixed, so now the game is for real fully playable. There's a lot of minor fixes that go in and minor improvements. Also, there is Falco's physics optimizations. There's things like [inaudible 0:40:04], but we might be able to get in things. For example, things are correctly fogged, things like that. The game was mostly there. The next big thing, now we are running out of memory after some time, and it doesn't seem to be some memory leak because if you sit still on the same region, even though the game is dynamic, it doesn't happen. It only happens where you're actively playing for an hour or so. It seems to be memory fragmentation, so there's allocations all over the space, the address space. Then you have a small allocation in the middle of a free region, and then you can't make a big allocation because you have this small allocation in the middle. That's the next big hurdle to - for my side, that would be the goal that would make this better because it would be fully playable, including the fog fixed, some new features, some better performance, and you can play like five hours without it crashing. That would be a nice goal. [0:41:03] KB: This is awesome. All right, you've accomplished the, I think some people were saying, accomplishable report. Where do you want to take this next? Are you thinking, "Okay, we shipped GTA 3. We're on to a new game."? Are you thinking expanding within the ecosystem? You also mentioned like taking stuff back to the community. Where are you going post-beta? [0:41:26] FG: I'm going wherever they're going. I had so much fun, and it was such a pleasure to be around. As long as they'll have me around, I'll follow them. [0:41:34] SKMP: I guess there's a few different paths we can take, but it really depends on where the community also wants to go, like Falco says. There's people that are trying to make some mods for the Dreamcast, like either to make it more Dreamcast-friendly or to bring in better models, better textures. [0:41:51] FG: To make it more Dreamcast-unfriendly. [0:41:54] SKMP: Yes, both things. [0:41:55] FG: We have people loading like Xbox models and doing crazy stuff. It's really cool stuff, though. [0:42:02] SKMP: It works. So that's one direction on this GTA path. I mean, for a final release, this would have to be fully localized. There's some minor things in the menu. Some strings are missing, things like that. Of course, it has to be fully playable, fully stable, no visible glitches. I guess then the next challenge would be Vice City. [0:42:27] FG: Oh, you said that out loud. [0:42:30] KB: All right. No committing here, but timeline-wise, are you thinking Vice City is a 2026 thing? Or what does that look like? [0:42:39] SKMP: I have no idea. [0:42:42] FG: Yes. To be honest with you, I don't know. I wouldn't know if it's like starting over from another GT. Surely, it's not. But I don't have enough experience doing GTAs to know, with as much of a head start as we have on the engine and stuff, how much further there is to go, if that makes any sense. [0:43:00] SKMP: One of the recent developers claimed that Vice City is essentially the same game, just kind of different story, so that is hopeful. I guess it's one of those projects that once you are in a couple of weeks, you know how possibilities goes. Most things would be in place. So if it doesn't have higher memory requirements, for example, which it might, things will be fine. [0:43:26] KB: So it's one of those that once you get in, you might be shipping within a week or a few weeks. Or you might be what we looking at a yearlong endeavor. [0:43:34] FG: Right, right. [0:43:35] SKMP: Exactly, exactly. [0:43:36] KB: Nice. One other thing that you mentioned that I want to dive in on, you mentioned community. How would people who are interested in this who listen to this who are like, "Man, that's cool." I love, Falco, your origin story here, right? You're like, "I just want to hack on this cool gaming platform. How do I do that?" How would you recommend today people start exploring and getting involved in this space? [0:43:58] FG: I would say start off - we have a wiki. A Dreamcast, not wiki, and this is like the ultimate community-driven mind dump of everything going on, and it has the definitive. If you Google Dreamcast development, I think it should be number one on Google. It's like the ultimate getting started, and this is how everyone starts off. Like tells you what do you need software-wise, what platform. Of course, we let you do Windows, Mac, Linux. We don't - of course, we're going to support your platform. But anyway, how to set up the SDKs. Then from there, pretty much inevitably, there's a link to our Discord server and Simulant. We also have one for DCA3, GTA 3. People wind up in our Discord server, and it's a lot of fun, and that's where all the coding really happens. But we're also on GitHub, KallistiOS. Yes, you can't miss us if you look for that in Dreamcast on GitHub. Stef, you want to talk more about our site for GTA 3? [0:44:57] SKMP: Yes. For GTA 3 specifically, we have some basic instructions, but you have to do the repack yourself, so you need to buy the game. Either have a copy of it or buy a new copy. We do support the version of the game that Rockstar is selling online, so it's kind of fizzy. You can download that and either install it on your Windows PC or install it with Wine. I have tried both. It works. From there, you have to run some commands. Download the Dreamcast SDK, run some commands, and it will bake everything for you. [0:45:28] FG: That is on dca3.net, by the way. [0:45:31] SKMP: Yes, dca3.net. The site is also browserable on the Dreamcast. [0:45:36] FG: Yes, that's the best part. [0:45:36] SKMP: We made sure of that. [0:45:38] FG: The guys who made that site, yes, made sure that it can be viewed with the Dreamcast web browser. That one's pretty cool. [0:45:43] KB: That's awesome. [0:45:45] FG: Yes. [0:45:46] SKMP: One more thing to note, for Windows, there is this Dream SDK installer that installs everything for you. It gives you a ready working environment where you can tinker on. On Linux and macOS, people assume that you can follow a tutorial. [0:46:01] KB: Well, awesome. This has been super fun, guys. We're getting close to the end of the time here. Is there anything that we haven't talked about that you would like to leave folks with? [0:46:09] FG: Yes. Stef, I want to put you on the spot here and ask you something I've never asked you. When this started out, this was running on an emulator-only. This emulator had double the RAM of a regular Dreamcast. When you started out, did you actually think that you would be able to make it to run on a stock Dreamcast? Or were you like, "Let's see where it takes us and maybe."? You know what I mean? How did that happen? [0:46:35] SKMP: I was, "Let's see where it take us," because I had no idea how the code worked and it would be able to thin it down. But to be honest, it was so effortless to get it initially to render something. It only took a couple of days, so I was very hopeful this would be doable. Because if it takes two months to sell something, then you're like, "Ah, maybe this is not so easy." [0:46:56] FG: Exactly. Yes, yes. Interesting. Yes. I always wondered. I always wondered if you knew starting out that you would make it onto a stock Dreamcast or not. [0:47:05] SKMP: I mean, it would have been funny, even if it was only for a modded Dreamcast. [0:47:09] FG: Right, right. Couldn't quite make it. It would still be great, but yes, not quite the same. [0:47:14] KB: Well, thank you gentlemen. Super fun, and we will catch you another time. I will definitely be checking out getting this installed on my Mac at least. [0:47:23] FG: Thanks. [0:47:25] SKMP: Thank you, Kevin. [END]