EPISODE 1702

[INTRODUCTION]

[0:00:00] ANNOUNCER: Luma AI develops technologies at the forefront of AI and graphics. They created a text to 3D tool that functions like mid journey but for generating 3D models. Another tool makes photorealistic environments by reconstructing any scene and 3D from just a few photos. Karan Ganesan is a software engineer, and Barkley Dai is the product and growth lead at Luma. AI. They joined the show to talk about the origin of the company and the technologies it uses. Gregor Vand is a security focused technologist and is the founder and CTO of Mailpass. Previously, Gregor was a CTO across cybersecurity, cyber insurance, and general software engineering companies. He has been based in Asia Pacific for almost a decade and can be found via his profile at vand.hk.

[EPISODE]

[0:01:01] GV: Hi, Barkley and Karan. Welcome to Software Engineering Daily.

[0:01:04] KG: Great to be here. Yes.

[0:01:06] GV: Yes. It's great to have you both here today from Luma AI. But yes, before we dive into what Luma AI is working on, let's hear about both of your backgrounds, and how did you get into this, what is kind of a fairly niche area of 3D AI. Maybe, Karan, do you want to start this off?

[0:01:24] KG: Sure. Hey, folks. I'm Karan. I'm currently based out of India. I was working in the VR, AI space like since 2016, building a lot of applications, and prototypes, and et cetera. Trying to understand what the space is. Then, I ended up working in two companies, and then ended up working at a corporate, like Flipkart, where I was doing a lot of 3D, and like AR capabilities for a lot of brands and companies. Eventually, when I came across Luma, kind of the thing that we're working on is like the foundational content for the future, whatever people call as metaverse. I kind of naturally, sort of, came across this and started learning about it, and here, I'm working on it. So yes, that's a short intro about me.

[0:02:10] BD: Hi, everyone. My name is Barkley, and currently the product and growth lead at Luma AI, doing more on the growth side, but also support our model on the research side. Previously, I worked as a product manager at TikTok, where I was the PM for TikTok effects. During that time, we're building a lot of these visual 3D effects, but also a focus a lot on generative AI ever since it becomes popular in 2022. I was working on a lot of these face changes that's using AI to generate different visuals for people transformed. They're doing a style transfer for them into different backgrounds. Also, I was looking more into these AI to generate like 3D, to reconstruct 3D. That's when I found out about Luma, and then, I think it's a really interesting field that prompted me to think that, "Oh, if this could be something that's going to be the content for the future, this might be it. 3D might be the answer." That's why I left TikTok in 2023 and joined Luma, focusing more on like growth side and product side of things.

[0:03:12] GV: Nice. Let's maybe also just hear a little about what led to Luma AI being formed in the first place. I think it was in 2021.

[0:03:22] KG: Yes. Our founder, Amit was working at Apple before starting Luma. When he was working in all the research, and like camera research for the [inaudible 0:03:32] that got released a couple of months back. He was in the forefront trying to understand what are the new things that can come up with rendering, building content for 3D, and XR, and VR, and those kind of things. That's when Amit came across this idea that he wants to come out, and build this sort of platform, or product around this technology called neural radiance field. The neural radiance field, the first paper came out in 2020. Within a year, he was talking to a lot of people like partners, investors, and whatnot, and trying to figure out what can we do with this tech, and what can we productize. That's when he sort of came to this conclusion that a company should try to build on top of the tech, and build products. And also, build it across platform. Sort of go beyond Apple ecosystem, like Android, Windows, or like web, and all these kinds of things. That's where Amit saw the opportunity that something can be done. That's all how Luma was found [inaudible 0:04:31]. Yes.

[0:04:32] GV: Awesome. Kind of bringing all those things together, getting the ideas from different places. But, I've used the platform, so I can kind of visualize what happens if I jump in now, and use it. But I think, to kind of set the scene for the listeners, probably completely new, or maybe completely new to Luma. I think a majority of our listeners may be almost completely new to this field at all. So, just to kind of understand a bit more about the Luma products, can you kind of explain what would I see if I wish just to go into Luma today? What are the products? How do they, at a very high-level kind of look and operate?

[0:05:09] BD: Yes. Luma AI, as a company, we have two major product lines right now. One is what we call Imagine, which is being able to generate 3D from text. That's more on the generative side of things, aligning with some of these recent texts to image, and all of these AI generation stuff. I will say, it's more like a 3D version of Midjourney, where you can generate anything from just a simple line of text with some prompting, and then you get a 3D model in any of these majority of this 3D format. To be able to export it, and be able to use as in games, in e-commerce, in a lot of different fields. That's on the Imagine side.

Our separate product line, which is where Luma started was is actually the capture side, which we currently call the product interactive scenes. Which is basically, reconstruct anything in reality and object as seen, just by taking a series of photos. Then, under the hood is using either neural radiance field, so NeRF or Gaussian Splatting to construct these 3D Interactive scenes from the image that had been taken. A lot of times, then, that means, after you take around different aspects of that scene, from different angles, we'll be able to construct it as if you're really experiencing that, as if you are recreating that thing into this virtual space, just as it is in real life. That's what we call the capture product, which is more for people to reconstruct somebody's reality, these objects from their daily lives into our virtual app.

[0:06:53] GV: Got it. Okay. Yes. There's the two products effectively, as you say, interactive scenes capture, and then Genie, which is the generative products, like text to objects, effectively. Maybe let's just take for a second, the interactive scenes. As you both mentioned, the kind of key underlying tech here is neural radiance fields, or NeRFs for short. Again, I'm going to guess a majority of those listening maybe have zero knowledge of this, or maybe only a small understanding of what they even are at a high level. Maybe, can we just dive in a little bit to like, what are NeRFs basically?

[0:07:31] KG: Sure. I tried to explain this in a different bunch of ways, and I think I found the right way to sort of explain it. I'm going to just say that. Today, we have things like JPG, PNG, and I call these image formats. PNG is the highest quality, it's like [inaudible 0:07:48]. It's basically a grid of colors in a sort of grid, where each point has a color. Now, in the highest level, it's basically similar to dot, but in like a 3D way, where you have x, y, z coordinate. But also, you have one more thing like the angle at which you're looking at. It's five parameters, the portion that has XYZ, and also the alpha, and theta values where you can - it basically denotes where you're looking from in like 3D space.

Now, with these five parameters, that's basically what neural radiance field is. Whenever you are trying to see something, you're basically querying with these five parameters, and like sort of getting an output from the angle that you're looking at. That's kind of how neural radiance fields work. 

In terms of what Gaussian Splats are, now, neural radiance fields are GPU intensive, where we are querying a neural network to sort of get these values based on these five parameters. Whereas, in Gaussian Spalts, like with the research that happened in the last few years, it's kind of an explicit representation. Whereas, neural radiance field is like an implicit, Gaussians are basically a bunch of spheres or some random shapes, but random sizes also. They are computer and they are just placed, so that you don't have to query them. Now, they also have this color thing where, when you look from different angle, you see the color. That's basically how Gaussian Splats for. Basically, it's a point cloud, with the different shapes in it, and you just basically query by saying what are you looking from. Those are basically how NeRFs and Gaussian Splats work from a top level.

[0:09:30] GV: Got it. I guess both of these concepts kind of originated from academia, or was there some other kind of application they were brought in from?

[0:09:42] KG: Yes. It came from academia. It came from UC Berkeley. One of the early authors name was like Matt Tancik. He also is working with Luma right now. Yes, it came out of UC Berkeley basically, like the AI lab that they have, that's where they sort of worked on these things.

[0:09:59] GV: I see. I think you just mentioned, he's actually an advisor at Luma or part of Luma, as well?

[0:10:06] KG: He's part of Luma. He's one of the applied research leads here.

[0:10:08] GV: Awesome. Okay. Yes, it's nice. Quite strong link there, to say the least. Let's also, I guess, move to the - that was kind of the underpinnings of the capture product, the interactive scenes. If we also now just move to Genie, which is generating objects from scratch. Again, if I can just - for example, I went in there, and I typed in - I typed a mug with a car racing team on it, because I have a mug on my desk that has that. So, it's just like, let's just see what it comes up with.

If you can imagine as a listener, just type in an object, and then, I got a nice set of three mugs that I could do something with, and we'll maybe talk about something with shortly. But yes, in terms of Genie as a technology, what is actually happening here to kind of go from, I type in a sentence or just two words, and then we come up with an object?

[0:11:05] KG: Some of the tech that we use for Genie kind of correlates to what we did for Gaussian Splats. It's kind of a similar underlying representation. But the key difference is, for the capture product, we have the images that the user took as reference to sort of make the 3D model, like the 3D presentation. But for Genie, we don't have end output. What we basically do is, we start from a black image, or like a noise image, sort of set of frames like that. We basically say that, try to come as close to this particular mug with a racing car or whatever.

I try to process, you just start running it, and you have to keep running it until you see the output, basically. We kind of sort of use like a lot of 3D models, and images as reference to sort of work with this. That's how it kind of works. Based on the data that we have on like train, we sort of try to make something that's relevant to what the user said. That's how basically like Genie works. As I mentioned, like underlying, it's still Gaussian Splats, but we convert them to traditional 3D formats when you show it to the user. That's how we sort of come to an end for the product.

[0:12:18] GV: Got it. So yes, my next question was going to be the relationship, I guess, between - I mean, this is - I'm sure, for some listeners out there, this is such an obvious question, perhaps. But the relationship between NeRFs, and Gaussian Splats, and then, actually, this sort of newer product, text to 3D objects. The relationship is mainly the Gaussian Splat piece, or NeRFs as well, or -

[0:12:41] KG: Mainly, Gaussian Splat, like representation. Just because, they are less GPU intensive, whereas nerves are like, you need a beefy GPU to even use them.

[0:12:50] GV: Yes. Yes. Okay, that makes - I guess, maybe just kind of - I guess if we look at the Genie product itself, I guess two questions. I guess, what was the driving force to get from the interactive scenes' product to then to Genie? I mean, I'm taking a guess here. Maybe is it to do with, it's more accessible, for example, as you say, it's less GPU intensive, i.e., more accessible, again. But what are also the kind of the applications that you see for the Genie outputs.

[0:13:22] BD: Yes. I think, the Genie part, which has text to 3D, we actually started to do that even before Gaussian Splatting comes in place. So, being able to generate 3D objects has always been a goal and a mission for the company, either at really early phase, when we don't have the interactive scenes in place. But after that, we just see this opportunity that there is this breakthrough that we can make in generating 3D objects from text. But, I think, the underlying reason is creating a 3D object is really hard. Actually, using our capture product to create a 3D representation of something in real life is also quite labor intensive.

For an average person to know how to create a good NeRF or Gaussian Splatting, requires them to take the phone and go around an object, circle it, at least two or three circles to be able to capture every details of it. But on the other hand, for Genie, what people could do is just type in texts, which basically takes less than 10 seconds, and then they will be getting something. Although it may not be realistic, it may not be exactly the same that they see, but they can then use that 3D object in subsequent pipelines, or in their games, in some of these e-commerce applications, in some of their 3D designs. Which is much more easier in terms of creation flow compared to our capture product.

I think, this always takes hand in hand. If someone is more caring about the fidelity, caring about the actual real-life representation, and wanting to capture the surroundings as part of their memories, as part of their collections, they will go to the captures. But if they want to generate 3D objects quickly, and without needing to worry too much about the details, but then just wants you to create them in mass quantity, those usually go for our Genie product, the 3D generation.

[0:15:16] KG: Also, to add to that. Around like a year before we launched Genie, we also had this product called Imagine, which was like a precursor to what Genie was. But the thing was, it took one hour to generate one 3D model. Whereas, Genie takes 15 seconds to make something. It also, it's like - in the one-year span, sort of software architectures increase, some research got in, and like the power in which we can use the compute, and the GPU is also kind of improved. That's where - it sort of made sense to start building something on a bigger scale with genies. That's when we sort of wanted to full on go on Genie, and build it out, and ship it.

[0:15:53] GV: I think, personally, if I look at both products, the interactive scenes, that was just the one that slightly captured my imagination a little bit more. Where I see these representations of - I think you've got some examples out there. There's a plane and a hangar, and I was like, "Oh, this is cool. I like planes, so there we go." But at the same time, you require the user to kind of input quite a lot from their site to make the output possible, I guess. Whereas, as you said, Genie, all I need to do is think of something and put it into a few words. Then, suddenly, I have a n object. I mean, is there any application, or is there any kind of plan to kind of go from the generative sides go from 3D objects to creating scenes, for example? Is that kind of where things could go or would they go there?

[0:16:45] BD: Yes. I think that's definitely something that we plan for. I think you mentioned a really great part, which is like, once you can generate something, then it's much more easier. But what you can also do with that, something that cannot be achieved in capture is, you can create totally imaginary things. That is a crucial part for industries like gaming, where if you're an independent game developer, and then you want to create a whole scene of this world, where maybe all these imaginary creatures live that never existed before, you can actually use Genie to create that. Because it can create things that, not just in real life, but also completely imaginary.

We've actually seen people in some of these sandbox games where it's easy to create user-generated, like really simple indie games like Roblox that people have been using Genie to create the assets for their games. Actually, Roblox have this like accessory library, where we actually found out that some of our users are using Genie to create all of these accessories in their repository, and then, they'll be able to earn the virtual currency on Roblox. That's one of these early use cases that we've seen people exploring Genie.

But as you said, eventually, we do want to create larger scenes. We eventually wanted to create something that can not only be a 3D representation, but can also interact and move through time. That's going to come a long way with the further development of these visual AI models. But then, eventually, the hope is potentially have the possibility to become a new, easier version of game engine to be able to create larger scenes, larger interactions, or even game place for people to turn their imagination into reality. To make everything into kind of like a game, but not those kinds of professional games, but just interactive experiences that they can play around with.

[0:18:46] GV: You mentioned gaming a couple of times now. Is that kind of the - do you think that's kind of the core use case immediately. I think, on the Luma website, mentions also things like e-commerce or VFX. These feel like industries that that could benefit from Genie. Could you speak a bit to sort of what are the other industry applications, I guess?

[0:19:06] BD: Yes, definitely. I think Genie is more potentially for gaming market. But we also think people are just doing it for fun. It's not that helpful for some of these more complex like game development flow, but then people just create these simple, even putting their Genie generations into an AR experience, and make it more into a kind of experience. Not so much as the games, but a less version of that. I think that's something that we - after we release the model, we've seen how the user interacts with it, and learn, and then kind of think about, oh, what this could lead us to. But, there are also other industries, especially on the capture side.

The capture side actually have been used by a lot of these like videographers that want to show their creations or their ideas in a different format. Imagine if you have - if you're shooting for a commercial, you can actually show things in a 3D representation. This could have a lot of potentials for some of these, for example, a shoe ad to be able to demonstrate that. Advertisement for some of these, like videographers who are doing these, like showcase videos. Also, e-commerce have been some of the prominent use cases, or industries that we've seen interacting with the capture product.

In the future, I think, also for Genie as well. If they have this strong demand for a 3D interactive, immersive display, then we believe some of these VFX artists, a lot of these even e-commerce smaller brands, our advertisement companies could have the potential to use those products, depending on what style they want their eventual outcome to be. But using these as a way to get in connection with 3D as a content format to showcase to people.

[0:20:55] GV: Yes. Okay, understand. Let's maybe go to more of the tech side of just diving a little bit deeper. In terms of on the north side, how do they handle things like specular highlights or reflections? That seems to be the difference between looking at something and kind of knowing it's just a flat object effectively, versus 3D, and sort of feeling like it's really part of the scenes. How is that actually handled under the hood?

[0:21:25] KG: Yes. I was also fascinated to learn that it was possible to do these things in the start. But then, when I started looking deep into how these things work, one thing that I realized is, it just comes for free, just because we do these things in sort of a 3D representation. Just to give some context, photogrammetry was the precursor to whatnot, and Gaussian Splats today. Everything needs to be mapped to a material with set of parameters or whatnot. They are to be sort of predicted based on what they look, and sort of do. In terms of NeRFs or Gaussian Splats, one thing to sort of keep in mind is that we should forget everything that we know about traditional 3D formats that have been existed so far. Then, have a fresh mind, because everything might be a parameter here, or a billion-parameter function here where it can just change a few things to sort of see changes.

In terms of how they work, since we are looking from different angles for the capture product, you basically know what the color is from different angles, and across different points in space. Now, if you just put them together, you basically see them like reflective or like specular. That's the key sort of difference between traditional 3D model materials and how we do sort of specular, and reflections, and NeRFs, and Gaussian Splats. Yes. That's how it kind of works.

[0:22:42] GV: Got it. Okay. Yes, I think that's a good way to explain it. Just that idea. We actually kind of working from almost like a new primitive here. This is not probably something that people have worked with before, conceptually. I mean, again, when I sort of saw, I guess, the interactive scenes, I actually, my mind jumps a little bit straight to something like maps, maps app of choice effectively. At least in the last five years, maybe, I've been more aware of - if I go into the sort of photo view, and then start zooming down, and I start seeing kind of 3D. Is that where this could also go as well, where we could start to have that? But I think that kind of leads into a second question, which is just the computational requirements of all this. Where are the limits today and where do you see the limits being in the future? Obviously, it's a slight open-ended question based on where GPUs are going, et cetera.

[0:23:35] KG: From what I see, today's Apple ecosystem, iPhones are like really good at sort of rendering 60 FPS native Gaussian Splats. That's what we have today on our iOS app. But at the same time, yes, we just launched our Android app a few weeks back. Then, we also made this native render on the Android app. One thing that we see is, yes, I mean, Android ecosystem was fragmented. Each company, or each model will have a different GPU. Generally, ARM cortex like GPUs are sort of a little underpowered than iPhones. Probably it will just catch up to the iPhones level a few years with Qualcomm sort of doing it's - doing like a lot of research and sort of increasing their compute, packaging the compute into smaller sizes. But yes, at the moment, from what I've seen, us or any other company working on this sector, I see like it's really good sort of render at 60 FPS usable frame rates on iPhones today. That's the current state they're on.

[0:24:30] GV: Yes. I guess if the consumer themselves are running it on their device, what about if effectively, that's just a client through to, I don't know. If Luma is running on your site, or another company could reserve a GPU instance, does that make sense in this kind of context or application?

[0:24:49] KG: Yes. There are two things when you sort of talk about this. One is training the nerf, that today, completely happens in the cloud. Today, we have like a super optimized pipeline, where when you upload a video, or use one of the apps, or use the API, you basically get the train radiance field network in like 20 minutes. That's the one side of things. At the moment, as I said, we offer that API or use one of the clients. We have flexibility to go white label or like use one of our clients. 

The second part is the rendering. That's what you, I think, you kind of wanted to sort of talk about. We kind of have two ways to render things also. One is cloud based. You can sort of make the camera move around or upload your own camera path, and sort of get it rendered on the cloud. Or you can use your iPhone, or Android app to sort of do it on the app. That thing runs on the client itself. There is no server color, like server compute required there. That works really well on iPhones also today, like the [inaudible 0:25:50] capability is what we call it.

[0:25:52] GV: Got it. Okay. Yes, that makes sense. I mean, outside of computer graphics, if you want to kind of make that generic phrase. Are there any other sort of applications that seem obvious here, computer vision, robotics, any other domains that you're kind of exploring? Or you kind of know that this is where it could go?

[0:26:11] KG: Yes. One thing that we have seen is like. people in the robotics industry showing interest on using sort of these Gaussian Splats or NeRFs. The reason being like, you can let a robot roam around, and do unsupervised learning. that's one use case that we have seen across in the past few years. So yes, robotics is like - I know some of my friends are also working on this sort of area, trying to use NeRFs, and sort of basically use it for training data. Basically, it's like synthetic data, but basically from real world locations, and let robots train on that. So yes, that's one use case. 

[0:26:48] GV: Well, I'm only going to hope there's one application for that, which is my robo vacuum cleaner, which is just way less smart than I hoped when it comes to understanding the room that it's in. So yes, if it could take advantage of this and start to properly visualize each room, that'd be fantastic. But I think we're probably a little while away from that for various reasons. Cost being.

[0:27:08] BD: Yes. Actually, I think that it's being used more on autonomous driving cars. I actually know that some of these like autonomous vehicle companies are looking into NeRFs, but probably too costly to build right now for a vacuum robot.

[0:27:21] GV: Yes. I'd hope it was, to be honest. I hope that wasn't where people are going with this right now, probably way more useful applications of it.

[0:27:28] KG: Also, in the industry sectors, digital twins is one other sort of use case that NeRFs and Gaussian Splats are really useful. I started my career building prototypes, and sort of digital twins and whatnot. Having an actual representation of what the machines are running in factories and whatnot, and sort of viewing it on a VR or AR headset. That's like really sort of gives you some context on what you're working on for the engineers who are doing big machines, or whatever. That's one use case also for NeRFs.

[0:28:00] GV: Yes. Lots of possibilities, which is very exciting. Let's move a little bit to just kind of the community around Luma. AI. I mean, it seems - I think, Barkley, you touched on it a little bit already with Roblox, for example. To me, Genie, especially seemed like a pretty obvious place where you're going to have a kind of creator community, as a sort of part of the product, almost. I mean, could be very true also for the - I mean, interactive scenes. Again, you've got to input your own things. Then, if you want to share them, that's cool. But I guess, what was the tangibility of these objects, that then people can do something with? Maybe it lends itself more to create a community where I can take an object that someone else made and do something with it. What's been the approach that you guys have taken to community sort of so far? I guess, I'm thinking, especially from the game side, and I play, for example, city skylines, which from day one had such a community creator focus. 

It was like, they were not going to release the first version of that game without creator community being part of it, which I found super interesting, but it makes complete sense now. I almost think, to me, that's almost a game that could also benefit from this, if I could type in like, "Oh, generate this exact building from a city or from a point in time." Then, that gets generated, and I can put that in my city, or something like that. But, yes, going back to the question, I guess, what has been the approach of Luma AI to community and sort of what does that mean for Luma AI?

[0:29:25] BD: Yes, definitely. I think quite surprisingly, capture has been the product that attract most of these communities' attention. Since for Genie, yes, it is cool that you can type in text, and then be able to generate 3D objects. But in other cases, when people share these on social media, like 3D object itself is not the center of the focus. It's usually, it needs to be integrated into some more complex gaming experience or AR experience. That's sometimes really hard to showcase, since it's more like a game or forcing people to interact with, to experience.

But for capture, for interactive scenes, that actually just gets natural attention on social media once that's posted out. So, interesting fact is, when our product was first launched, we didn't do any of these advertisements. We simply like just posting on our social media channels. We've just started to see these grows in all over the world of people trying this out, and then just marvel at, "Oh, this is what you can do with just your phone." Then, that's amazing.

One of the core use cases that our users have found out themselves is that, by using the reshoot in NeRF, by recreating that scene that you captured, you can actually simulate a drone version of flying into that object, even if you just shoot it just with your regular camera and a walk around to capture that thing. A lot of these people use this to make a drone version of themselves, or of their beloved cars, motorcycles, and then showcase them on social media, which then sparked this wave of people capturing these strong views that we just didn't anticipate in the beginning. It's not even in the United States, where we're based off. It's a lot in some of these countries like India and Indonesia, where people are more excited about this new technology.

As we found out that these products have this natural attention on social media, we just start to feature some of these really good captures on our profile page. So, if you're going to the front page of Luma AI app today, you can see a lot of these handpicked good captures where people created and captured different kinds of scenes or objects. Then, we want to showcase to inspire other people on what to create. But I will say, the remainder of the gross comes majority from social media where people are collaborating with us. They want to showcase a really good capture, and they did tag us on social media. We just tried to further amplify that. So yes, that's something that we're also hoping to do with Genie side as well. 

We have been seeing some of these more advanced creators being using Genie to create these awesome experiences. But admittedly, the bar for - like using the Genie creation, and it further creates something that's visually astounding to people. The barrier is still much higher, which is why we actually, still right now, like most of our social media attentions have been focusing on the capture product. Where Genie is more, I would say, a functional product that's useful for people within these like vertical industries, like gaming, like advertisement, e-commerce, et cetera.

[0:32:43] GV: Yes. I guess with Genie, it really is effectively a prompt engineering economy in that sense. I remember, I think before ChatGPT came along, people started sharing images on, I guess, it was on Instagram. The thing is, they wouldn't say how they created them, they just sort of said, "With AI." But I think, that was what people actually really believed that they have this skill somehow to create these things in AI. It was, I guess, a lot about the prompting, and it's very impressive. I cannot go into Midjourney and create these images at all. Do you see there being some kind of, I guess, evolution of that within Genie, where you have these kind of almost celebrity creators effectively, people that really understand the Genie product, and like how to get the best out of it? Is that sort of something on the roadmap, or not roadmap maybe, but just sort of, is that what you're thinking about? I don't know.

[0:33:38] BD: Yes. I think, we have not seen that in Genie. We have actually seen that in captures, where people know how to do good captures, or just become the role model for other people to follow, to take a similar capture. For Genie, I think still is because, currently, it can only achieve like creating a 3D model, which is not a complete, like consumer experience for other people. If you put that Genie model on Instagram, you can come up with a really creative combination, for example, like two animals merging together, like a lion dragon or something. That will be really interesting at first, but then, after the user has seen what's capable to achieve in image land, with like Midjourney, with all of these text to image tools. Just simply turn that into 3D is not that exciting for them. That's why, for our future plans, we're still thinking about what kinds of experiences that we can enable people to create that eventually turns these creations, these 3D models on Genie into something that can be a holistic consumer experience format that can be enjoyed and interacted by consumers.

[0:34:46] GV: Yes, maybe you've answered it slightly, but kind of looking at the next six to 12 months, say, for Luma AI from what you can share. Was the evolution, and you call it multimodality in AI, where do you see that going, and what specifically can you share from sort of the Luma AI roadmap otherwise? 

[0:35:06] BD: Yes. I wouldn't be able to share that much until our model is released. But we are working on very exciting new fronts that aims to eventually achieve visual creation. Basically, turn people's imagination into reality. Genie is the first step towards that, like be able to think about something, and then create that in 3D. But, what could that eventual experience be, where people can create these experiences, where it's not exactly like a well-defined game, but it could be something between videos and games, where you can still create these scenes on a temporal sequence.

Imagine one of the simple extension that we can have is, what if the 3D characters that you create out of Genie can move and can talk. After that, then, it will be something that also not just happened on a spatial scale, but also on a temporal scale, that we can allow some of these more interesting consumer experience to happen. That's where we are working towards on the research side, towards what we call multimodality, or like a general visual AGI. But yes, there will be very exciting things that will come out in the next six to 12 months or so.

[0:36:21] GV: Awesome. Yes, that does sound exciting. I mean, we've had at the time of recording, I don't think it's live yet. But we've got another episode with another company, different area of this, but AI gaming. I can see where the future for the next six plus months is going to be very exciting, I think in this space, where suddenly a whole ton of possibilities are opening up. When we apply this kind of technology to you know, effectively to gaming is virtual worlds, virtual - think of what you want, and then make that into sort of what feels like a real world, so super exciting. 

Where can our listeners - I'm sure, if anyone, for those that aren't familiar until this point, I'm sure they're pretty excited to go, just check it out. So, where can they go learn about just this tech themselves, and like where also just the best place to kind of start with Luma? 

[0:37:13] KG: Yes. For Luma, like lumalabs.ai is the website. You can also search for Luma AI on the Play Store or the App Store. You can find that pretty easily. In terms of trying to learn about the tech, the interesting data that I found in the past couple of months or weeks is that, you can ask these really good LLMs to just make really personalized answers for you. Say, what, explain to me like a five-year-old or ten-year-old, and you will literally understand what this tech are. Just go to Perplexity or ChatGPT, like just asked basically, and they give you the resources for you.

[0:37:44] GV: Awesome. 

[0:37:45] BD: Yes. I would also add on to say, for NeRF, actually, the Berkeley Lab who originates or where NeRF originates have also this open-source project called NeRF Studio. Which we are also kind of like, since it's all coming out from the similar lab, we're kind of in this [inaudible 0:38:02] relationship with. That they have a lot of these like open-source solutions or stacks that's needed to explore NeRFs, and there's a lot of learning resources and papers that these researchers who came up with have written.

[0:38:18] GV: Awesome. Yes. It's always exciting when there is a very powerful open-source part of this, that people can go and play around with as well. So, nice. I'll be checking that out. Well, thank you to both of you for coming on today. I've learned a lot. I'm sure our listeners have learned a lot. Yes, I mean, this is obviously a pretty - still pretty nascent space. I'm sure we'll be talking again for in like a year, when we get to see some of these exciting things that have been coming out. So yes, look forward to following along.

[0:38:48] BD: Awesome. Thanks for having us.

[0:38:49] KG: Thanks for having us.

[END]