EPISODE 1627 [INTRODUCTION] [0:00:00] ANNOUNCER: This episode of Software Engineering Daily is part of our on-site coverage of AWS re:Invent 2023, which took place from November 27th, through December 1st in Las Vegas. In today's interview, host Jordi Mon Companys speaks with Mike Miller, who is the Director of AWS AI Devices. This episode of Software Engineering Daily is hosted by Jordi Mon Companys. Check the show notes for more information on Jordi's work and where to find him. [INTERVIEW] [0:00:37] JMC: Hi, Mike. Welcome to Software Engineering Daily. [0:00:40] MM: Hey, Jordi. Thanks for having me. It's really great to be here. [0:00:43] JMC: Tell us a bit about yourself, because we were talking off camera, or before recording about yourself joining the dark side, meaning the business side of things, but you actually did study something technical, or become fluent in something technical before joining the dark side, right? [0:01:00] MM: Sure. Yup. Absolutely. I got my degree in electrical engineering and computer science from MIT in Boston. Traveled my way across the country. Had a stint in Austin, Texas, where I worked at a couple software startups. I ran professional services organizations, software support organizations, marketing orgs, landed out here in California, had a short stint at Google, another startup, and then landed at Amazon about 11 years ago. [0:01:24] JMC: What were the domains, the products, the services, roughly, that those companies - did you specialize? Because you have definitely specialized on something at AWS, something that we will be talking about in a minute. Did your career follow a trajectory? Was it more or less what you desire with each one of these phases? [0:01:42] MM: I think, when I talk to young people about career plans, I often talk about like, your career plan's not just a straight line between point A and point B. It's like meandering around a playground, finding the stuff that's interesting to you and then spending a little time on that and then exploring something else. That was definitely my career trajectory. In Austin, it was enterprise software startups in California. I worked at Google for a short time, but then I worked at a company called OnLive, which did the very first cloud video games. I got introduced to the video game industry and online video processing. Then I moved to Amazon where the first five years, I ran product management for Fire TV helped launch the line of Fire TVs. That was incredibly exciting, learned all about the content industry there, but also, got introduced to Amazon product management. Then about six and a half years ago, moved over to AWS, where I worked in their AI and ML organization. It's been a great variety and that's something about me that I really love is learning new things all the time and Amazon and AWS is great for that. [0:02:46] JMC: Nice. Exactly. You mentioned AI, right? We were going to talk about the current topic, hot topic of all the software industry and even everywhere. Now with all the drama happening in OpenAI, we're recording this on the 21st of November, a week before re:Invent. My mother called me yesterday. She's a 77-year-old woman talking about what happened at OpenAI. She knows nothing about it, but everyone's talking about this. Regardless of the gossip, it's because it's relevant and you've been quite deep into the weeds of AI, at AWS for the last six years. At the moment of the AWS, that OpenAI captured the headlines of most media outlets and even my own mother's attention, what was AWS doing at that point? When you're - the AI journey of AWS, did that announcement catch you? [0:03:36] MM: Yeah. Well, I guess, Jordi, I would start by saying at Amazon, we've been doing AI and ML for a very long time. If you think about the very earliest product recommendations on the amazon.com retail website, moving forward from there, you've got the robots in our fulfillment centers, you've got Prime Air, you've got Alexa. All of these products and things that Amazon has been working on have a very high degree of artificial intelligence and machine learning wrapped into them. When I joined AWS in 2017, that was the year we launched Amazon SageMaker as this hosted, end-to-end machine learning management platform, right? It was a recognition at that time that customers needed a more simple way to approach AI and ML and integrate that into their businesses. Over time, we've seen that expand into this three-layer stack, whether a company has deep machine learning expertise, they can get in at the bottom level of stack and access primitives, if you will. They can use SageMaker in the middle layer as this hosted end-to-end platform. If they don't want to touch machine learning, we've got these API callable services up at the top layer of a stack that customers can use to embed machine learning into their existing applications. This has been happening and maturing and growing for quite a long time at Amazon and AWS. [0:04:55] JMC: It seems like, the jump in quality, at least the jump in a new direction and a new horizon might be LLMs, right? Correct me if I'm wrong, you're the expert here. [0:05:07] MM: Yeah, a 100% without a doubt. Transformer models, which form the basis of these capabilities really started to come into their own a few years ago. Actually, I ran a product called DeepComposer, where we tried to teach people about transformer models through music and generating musical accompaniments and predicting the next musical notes based on melodies that you would play. We launched that in 2019. [0:05:30] JMC: But wait. Was it an educational product? [0:05:32] MM: Yes. [0:05:32] JMC: Was it from - Okay. If we can define LLMs, well, at least the output of an LLM as the prediction of the next word, that makes sense to the previous word that would be the input. The prompt would be the request of information and the information retrieval phase would be the LLM guessing what word after the other, what sequence of words would make sense for the input that again is the prompt. In the case of the product that you were describing, there was a prompt, an input of music, melody, and the output of the product was a suggestion of a continuation of now. How did that work? [0:06:09] MM: Yup. That's exactly right. We used MIDI. MIDI is an electronic representation of notes. We didn't actually deal with audio files. We dealt with MIDI notes. Based on MIDI notes, we made a prediction of the next notes. This product was called DeepComposer. It was actually the third of a set of deep products that we built. The first was DeepLens, then DeepRacer, then DeepComposer. These were all about giving developers hands-on access to this brand-new AI and ML technology and making learning fun and hands-on, because we know that fun motivates people, right? It's not like reading a dry book, or research articles, which was like, at that point in time, that was what you had to do to learn about these technologies and techniques. What you wanted to do was put that into the hands of people and make it more exciting for them to see and learn. [0:06:56] JMC: Are those products, by the way, still available? Are you reshuffling that portfolio into something? Because we will talk about the newest and brightest thing parts later, but what about those products? Are they still running? [0:07:07] MM: Yeah. DeepLens, we just took off the market. It had reached its useful lifecycle. DeepRacer is still going strong. In fact, we see a lot of pickup from large enterprise customers of AWS using DeepRacer internally at their companies to generate excitement and motivation to learn about AI, ML. You've got even the world's biggest companies, like JP Morgan Chase, they have an entire racing league inside their company for users to use DeepRacer. What you do is you train these models in the cloud, you download them to a little car, and then the little robot car races around a track. It's a really fun concept and a lot of our customers and individuals might have adopted this as a way to get hands-on, do some fun learning. DeepComposer is still around, but again, it was a bit early in terms of the right product fit, but it's still around and certainly the online experience is interesting. [0:07:56] JMC: MIDI has a bit of a few limitations in itself, I would argue, too. I mean, not that I'm aware of that that affects the end product. But in my experience, it has a few limitations, but otherwise. Before we jump into PartyRock, which is the product that was very recently launched by AWS, that brings us here, but what I really wasn't expecting when I was doing research is that although you just described a very prolonged approach and hands-on experience and delivery of ML products at AWS and you've very succinctly summarized just now, I didn't know about AWS's approach integrated vertical, completely owning end-to-end the stack, right? From the design and deployment and use of physical machines, so AWS Silicon, to anything in between PartyRock, which would be, in my view, the surface level of this stack. Can you describe as what's the rationale of - or the reasons why AWS would rather go for this incomplete integrated approach for AI in general? [0:08:58] MM: Yeah. Well, when we think about customers, right, there's going to be such a wide range of experience and needs and the depth that they want to go to, or that they're able to go to in terms of implementing AI and machine learning. We wanted to make sure, and AWS always does this. We think about innovating on behalf of our customers and providing the greatest choice at the best cost for our customers. Thinking about that, as we started to build out our AI, ML offerings, we realized that we needed something at each layer of this stack of expertise and of capabilities for our customers. I mentioned it before, but we can go into a little bit more detail. That bottom level of the stack is for the expert practitioners, the folks who want to get their hands on in a bare metal sense, right? At this layer, not only are there the EC2 machine images that have all of the deep learning libraries and capabilities that you need, but we've actually gone a step further with custom silicon. There's Inferentia and Trainium chipsets. These are chipsets that are designed very specifically to reduce the cost of machine learning functionality, whether it's the training part where you're processing petabytes of data into these models, or whether it's in the inference part where you're making the predictions. At the bottom level of a stack, we've got the silicon and the machine images that expert practitioners can leverage if they want to do this work themselves. A lot of customers though, they don't want to get their hands dirty with that. They want to stay at the mid-level. This is where Amazon SageMaker, which we released over five years ago now, as this hosted end-to-end machine learning platform is really great for a huge number of customers, who maybe have some machine learning engineers, they have some data scientists on staff. They do want to get hands on. They can take advantage of this rich set of hosted capabilities of SageMaker to actually build, train, deploy, monitor the models that they've got in production. Then you've got this layer on top, which is like, if you're a customer and you're building an app, but you don't have any machine learning expertise on your team and you just want to embed machine learning capabilities, like predictions, or fraud detection, or transcription, or things like that, we have API callable services that an app developer can just call the API, feed the data, get a prediction back and then integrate that into their app. Really, at every level of customer interest and expertise, we've built up an offering for those customers. If I take that and I apply it towards this foundation model space, this new generative AI that really has come to, really matured in the last year, it's just this crazy number of - a crazy amount of data that's really powered this, right? That's where Bedrock comes into play, which we can talk about. Bedrock is foundation models as a service. Again, we're trying to give customers the ultimate choice and the flexibility, because we don't think there's going to be one core foundation model to rule them all. We think customer choices is where it's at. [0:11:54] JMC: I would love to delve into the architectural bits of silicon, but that's probably for another episode. In fact, Bedrock is what caught mostly my attention, because what are the basic differences between the models out there? I mean, you don't need to be an expert, especially in those that are close source. We might not know much about Claude, although AWS and Anthropic, the company that develops Claude have stroken a partnership very recently. But still, maybe this information is not shared. Yeah. What kind of different approaches can someone willing to invest in developing their own foundation or model, what capabilities can they leverage at the Bedrock level? Also, if you've had already experienced with clients, what are the reasons? Because it feels to me like it's a huge investment in time, but it might be worthy. I'd like to know the reasons why someone, a company would invest in building their own foundation, or model with Bedrock. [0:12:49] MM: Yeah. Bedrock, as we mentioned, is foundation models as a service. Customers can get access via APIs to a variety of first-party and third-party foundation models. Some of them are multimodal, some of them are text models, some of them are embeddings. Each of these models can serve a different purpose just in terms of the type of model and then the size of the model. We've got, for instance, Jurassic from AI21 Labs, there's a mid-size model and an ultra-size model. These models perform differently based on how many tokens they've been trained on, or how many data points have gone into the training and what training sets have gone into them. Each foundation model is going to be a little bit different in terms of what it's best at. Like, multi-language, right? Maybe a model has been focused to understand five, or 10 different languages, and so it's better at maybe doing some translation type tasks. Maybe a model has been trained more in using conversational capabilities, like what's called reinforcement learning using human feedback, in order to be more conversational. The actual interaction patterns with that model might actually tend to perform better in a conversational manner. You've got a lot of different data points and configurations that can be used based on what your task is. As a customer, when I use Bedrock, I can make a choice. I can say, I want to make API calls and make requests to model A, or model B, or model C based on the particular job to be done. What's the outcome that I'm looking for, right? That's what makes Bedrock really interesting is that depending upon the outcome that you want, or the job that you have that's to be done, you can actually find a model that's very much optimized for what you want. Whether it's the modality, like I want to generate images, or I want to generate text, or I want to generate embeddings that I can use for search, or the sizes of the model and the conversationality. [0:14:43] JMC: I guess, and now I was thinking before about who would invest the time and the money to build something upon this SaaS offering of yours, and enterprises came to mind, which are usually very wary of sharing the data. If you're going to train these models with your own data, do they do some ephemeral inference when it comes to inferencing? How does that work in terms of data sovereignty and then, where the - what happens with the training data that the owner of the data offers to the model? [0:15:11] MM: It's a great question. As an enterprise customer, this should be top of mind for you is how secure is this? Do I have the ability to use my own data to improve the performance of the model? There's actually a couple different ways to do this that we talk about with customers. There's everything from the hardcore end, which is retraining, or building a new foundation model, which we don't recommend, unless you're a super deep, large enterprise and you've got rich experience in machine learning. Because there's a couple different easier ways to adapt an existing foundation model to your needs. The couple key ways to do this, one is called fine tuning and one is called RAG. Fine tuning actually allows you to enhance the model using your own data and actually do some additional training tasks to create a customized version of that foundation model for you. Now, Bedrock supports fine tuning based on the particular model that you're interacting with. What happens is if you do fine tuning of Bedrock, that fine tune model is yours. It's secure. It's only available to you, because it's got your data. It's basically, nobody outside your VPC can access it. It's highly secure and that data remains yours. That's the training. That's a little bit more expensive, requires a little bit more expertise and data, right? You've got to use data for that fine tuning. Then there's this process called RAG, which stands for retrieval augmented generation. RAG is this way that's almost - I like to think about it as post-processing. You're not actually touching the foundation model. You're enhancing its responses, because what you're doing is based on the prompt, you're looking up data in another system, and then you're providing that data in the prompt, or as part of the context that you're asking the foundation model, so that it can align its responses and give you more accurate responses based on the data that's available. RAG is a lot easier conceptually, because you're not actually touching the foundation model, you're just using the foundation model as provided, but you're enhancing the output and making it more accurate by using your own data as part of this RAG process. [0:17:16] JMC: One thing that I liked about foundation models provided by AWS, which I'm not sure are part of Bedrock, but probably also. I'm talking about the one specifically running behind CodeWhisperer, that is trained on public data, obviously. One thing that I really liked and I found unique to CodeWhisperer, there's maybe other companies like AWS and other products like CodeWhisperer offering it, not to my knowledge, is that they will - whatever they offer, in this case, code suggestions, they will link back to the code snippet that I'm hesitant to say, generated the output, because it's not true. The output is generated by an inference process that was triggered by a prompt. When there's an incredible similarity between the output and potentially, snippets of training data, but again, comes from public data, not from clients, they will establish some "provenance," which I find very neat. Especially very neat, and not only very neat, but also of good open-source citizenship. Let's call it stewardship, if you wish. What are your thoughts on Code-Whisperer, by the way? [0:18:22] MM: First of all, CodeWhisperer, amazing. You're absolutely right. There's foundation models that back it up. Code Whisperer is awesome. From that provenance perspective, we will also check to make sure that there are not the code that's generated are there existing licenses that are associated with those code snippets and surfacing that data, which is super critical when you're building enterprise software products. I think what's interesting is you can take that model of citations, or where did we get the data to justify this answer and surface that to the user. I think that's something that a lot of the providers of applications that are built on Bedrock are doing these days, because it earns trust with a customer, right? Especially somebody who's using gen AI. We all know there's a lot of hallucination that can be involved. You want to see the provenance, like you said, I love that word, or the citations for the data that's being generated and coming from. The more that you can do that, the more you can earn trust with users and have them become more familiar with what's happening. I'd love to tie this into - I know we're going to get to PartyRock, but earning the trust of customers and getting them to have some intuition about how these things work was one of the key reasons that we did this new software, this new product called PartyRock. [0:19:35] JMC: Yeah, how so? Because you seem to be quite obsessed in a good way with educational, like delivery of AI. I find PartyRock, it's been now for how long, two weeks? [0:19:46] MM: Yeah, a week. [0:19:47] JMC: More maybe? A week, yeah. [0:19:48] MM: No, one week. Less than a week. [0:19:51] JMC: This is a playground, right? It talks about how playful this thing is already from the get go, which is fun and funny, with the layout. The design of it is sprinkled with flashy colors and you can modify most of the items in the layout, and so forth in the UI. Tell us about - [0:20:07] MM: Yeah, where it came from. [0:20:08] JMC: Exactly. What are you guys up to when you launched this? Then, we will go into what people are building with it. What is a long-term vision and how does it fit into the end-to-end stack, the integrated approach that we just described over and over. [0:20:21] MM: Absolutely. Well, let me give you the back story. [0:20:23] JMC: What would you think of it? Yeah. [0:20:25] MM: Yeah, let me give you the back story. Generative AI exploded early this year. I mean, we were building Bedrock and we have already had foundation models that we've been working on for a while. A lot of our developers across the company started to wonder like, "Hey, how can I apply this to my job, or to my particular service that I'm building in AWS?" One team started to build their own little miniature playground, but they did it in a unique way. It wasn't just a text box with an output. It was thinking about it as widgets that you could connect together. They started to build a few demos and then make this thing available. Then they thought, "Hey, wouldn't it be cool if like, as I built these prompts and connected them, I could show them to my co-workers?" What they did was they made the URL of these applications that you were building shareable. This started to spread like wildfire around our company. Folks sharing URLs. "Hey, have you seen this playground thing? Check this out." I mean, myself, I saw it. Somebody sent me a Slack message. They're like, "Hey, have you seen this thing?" I started playing with it, the light bulb went off like, "Hey, wait. This is actually an awesome, fun thing. It gets you using prompt engineering and the foundation models and choices and thinking about all of the things you have to think about when you want to build an application using generative AI." We said, "Hey, look. We've done this three times already in the past, finding these cool hands-on ways to bring this new technology to customers." We knew that this is - generative AI is a step function change in terms of the capabilities. We've got to figure out how do we get this into the hands of more people faster. We took that little germ of an idea and we productized it. It's reflected and you called it out, right? The UI is a very fun and playful. The name PartyRock, it's an Amazon Bedrock playground. We wanted to make it clear that this thing is about play. It's about low risk. You can experiment. You're not necessarily in a production environment, touching stuff, right? This is all for you entertaining. We see people building apps, everything from recipe generators based on what ingredients are in my pantry, a bike route suggestors. "Hey, I'm in this city, or the weather is like this. Where should I be biking?" People have actually been building prompt engineering tutorials inside of PartyRock to teach people about prompt engineering using prompt engineering. It's been a really fascinating road. We're really excited about the adoption so far of people finding it fun, finding great ways to get into prompt engineering and start building that intuition for how this stuff works. [0:22:58] JMC: To me, also solves some collective creativity problem in a way. The last few years have seen a rise in cloud IDE, so the ability to have your IDE, your compute instantiation running somewhere, different than your laptop, and being able to share those environments. Not only code using VMs and AWS, for example, but also, being able to share that with other co-workers, and see how the application is running, or the source code of your application is doing certain things with other co-workers. That is limited to those that are able to program, that know about development and stuff like that. These, when you described it and what I've been playing around and what I've seen people creating seems like that for the no code developer, for the low code developer, right? In a way, it's like, many people do not know how to code and yet, would like to collaborate on the creation of applications. I feel it's a bit ambitious to call what the outcome of PartyRock is an application, but I see what you mean in any case. I see, again, the same need to create and share applications built with PartyRock as they see a parallel with the rise of cloud IDEs and shared environments among developers. Do you agree with that? [0:24:15] MM: Yeah, I think that's a great observation. From the PartyRock perspective, we felt that not just sharing what you did so others could see it, but this ability to take inspiration from others and make it super easy to build off of what somebody else did, which is how we got to introducing the remix functionality. Just with a single button, you can in engineering speaking, and fork the code and basically make a copy of that code in your account and then build off of it. We wanted to make it so that you could find inspiration in what other people were doing and then learn how to tweak it in this low code, no code way. Now, AWS has been doing low code, no code for a while, right? We've got SageMaker Canvas. There's other really interesting products focused on building production apps using low code, no code capabilities. You're absolutely right. PartyRock, we're not trying to displace any of those. This is very much a for fun entertainment purposes way. Again, I go back to what we wanted to do is we wanted to democratize the accessibility of generative AI to the widest number of people. That's also where you see how we thought about the access, right? You just need a social login to get access to PartyRock. We've got a rich temporary free tier of access that people can use. We want to make it really easy for folks to get in and use this thing without needing to be a software developer. [0:25:40] JMC: Is there then a holistic global approach to AI in general, AWS? Does this fit some vision that burner, or someone has at the company that is being delivered and probably announced at the next re:Invent? What is the general vision for the next six months to one year? [0:25:59] MM: Yeah. Well, we've already talked about this multiple times is how do we reach millions and millions of users and educate them on AI and ML? We were even thinking about this and implementing steps to do this even before the generative AI craze, because we knew that AI and machine learning is this seminal moment in our culture. For us to do better for the planet, the more people and the more diverse backgrounds, the more different perspectives we have from individuals who know about machine learning and artificial intelligence, the better off we're going to be. For instance, last year, we launched an AI educator program that was focused on community colleges, where we provided resources and training for professors at community colleges to increase awareness and expertise around AI and ML. We launched an AI and ML scholarship program. Yes, all about democratization. [0:26:50] JMC: This actually prompts a question. In your own experience, but also with your clients and with the users of all these programs that AWS very generously released, what is the biggest friction in learning prompt engineering, or prompting LLMs? What are the biggest problems, the most frequent ones and most common ones that you have come across? [0:27:10] MM: I mean, prompt engineering itself is fairly straightforward. I think it's in a lot of the nuance, right? It's a bit of mix of art and science, which is a little cliché, but it's true, right? If anybody who's tried to write a prompt to generate something that they want to get out, there's a little bit of an art to it. I think with PartyRock, that's what we were after is providing this low friction, fun environment to experiment and see. I think there's also some really interesting stuff with prompt engineering that you can do that's related to chaining and taking the generative AI text that's generated and using that as input for more generation. That's where PartyRock and chaining widgets and also, introducing the multimodal capability, so you can generate an image and then you can even have that image passed as input into another widget. It allows people to make these connections and the light bulbs go off of, "Oh, I see how these things can fit together. I see how the results of my prompting, and maybe even the results of some deeper configurations, top P, or temperature impact the output from the LLM. We gently introduce those concepts to users through PartyRock. [0:28:16] JMC: Yeah, exactly. I think it's generally true, the way I'm going to phrase it, that the LLMs do not have too much state, do not have a lot of memory, don't know what the previous input, or inputs previous to that one where, so the ability to chain those is a good one. Also, people are not so aware, or not very aware to that the generating image, speech to text, or text to speech, or text to text and all those things really leveraged quite different models. Do you need - [0:28:46] MM: That's right. [0:28:46] JMC: - a really good strong technical background? I mean, literal background, so something running behind the scenes to connect those things in a seamless way, in the multimodal way that you were describing? Yeah, can you describe in elaborate fashion the two last features that you mentioned about temperature and top seat? [0:29:05] MM: Yeah, absolutely. These are two parameters that you can provide into LLM prompts. They basically, the best way to think of them is they control the creativity of the output of the model for that prompt. I can give you a detail about temperature, for instance. You explained at the beginning the way some of these large language models work is they, based on the previous text, they guess what the next tokens are, the next words, or groups of words are, right? If you think about the probability, the model has a probability of let's say, you got this phrase, okay, what's next? Well, we've got this probability. This one's 30%, 20%, 10%. Now, if the model always picked the most probable next word, what would happen is you get really dry output and you might even get into loops, where the model output just says the same thing over and over and over again, because you keep hitting that same high percentage of output, right? Temperature allows you to add a little bit of entropy, or add a little bit of randomness into how far down should the model go in the next token selection? It's not always picking the most probable next, but maybe one time, it picks the second down, one time it picks the fifth down, one time it picks the most probable. You can use that temperature to influence some of what looks like the creativity of the model, but it's actually just dynamically changing the choice of the next token, right? If you dive deeper down into that probable stack, something that's like, oh, maybe this is only 5% compared to 20%, you're going to get a more interesting dynamic output. That's why we give that control to users, so they can get some intuition about how these configurations can actually impact the output. Hopefully, that explanation made sense. [0:30:39] JMC: It does, actually. Yes, you're right. It gives the LLM model a bit of freedom in terms of selecting the next one, not only - or it says to the model, don't follow the underlying principle that will force you to choose the highest probability word next, but actually, become a bit less strict and select probably, the second most probable word, or the third one, as you said. That will make the course of the next selection and the next one and the next one and the next one certainly different, at least different than the most probable path. Yeah, that's quite fun. Yeah, yeah, yeah. I mean, again, the product is literally fresh out of the oven, a week ago was released, but what is next for PartyRock? This looks like a wild experiment that people are having a lot of fun. Does it fall into the context of something bigger? Is it going somewhere? Again, what is the context for this? [0:31:33] MM: Well, I'm certainly unable to tell you what our roadmap, or product plan is. But I can certainly say, at Amazon and AWS, we're always listening to our customers. We love getting feedback. We love thinking about like, how can we solve the next problem for them? I think it's pretty clear in terms of the capabilities and the functionality, there's a lot more that we can do with the product, everything from being better about the sharing aspect and highlighting apps that other people share, because today we just ask people, "Hey, tag them with a hashtag, or share them on your social network." There's no central place for users to go. I can imagine there's some improvements around there we can do for discovering these things. There's other capabilities that we can add to our widgets, we can add more widgets just to make these things that you build more app-like right? You can maybe have selection drop downs, or things like that. I think those are a couple of ways that we're thinking about this. Certainly, we're an Amazon Bedrock playground. As Bedrock releases new capabilities, new foundation models, enhances their offering, we will strive to roll those things in the PartyRock and present them in a fun, quirky way for users to learn about and take advantage of. You can imagine that as Bedrock improves, those things will show up in PartyRock eventually. [0:32:48] JMC: Now that you mentioned Bedrock again, which is, I think the most fascinating piece of the complete end-to-end stack. I mean, as I mentioned before, and that we wouldn't delve and promise to keep that promise into the custom silicon that AWS is probably designing, is a word that done. But now that you mentioned Bedrock, what is Titan LLM? Is this one of the first-party models that you mentioned at the beginning? It has been developed by AWS. Can you elaborate on what Titan LLM is? [0:33:15] MM: Sure. Yeah. We announced Titan sometime back. It's currently in preview. It is a first-party foundation model. Amazon built and trained a foundation model. We talked about the text version. It's an LLM that handles prompts and text output, just like other LLMs. A lot of our customers have access to this in preview. I don't know when it's going to be generally available, but it's one of the first-party models that's inside of the Bedrock offering. [0:33:43] JMC: Do you have any details about the token, the size, the typical questions that everyone is asking, like the cars, like how many liters does it fit into that engine, whatever? Do we have the data for that? [0:33:55] MM: I don't have those numbers off the top of my head, and I wouldn't want to speculate. I want to make sure we give you accurate data. Jordi, I can follow up and you can post that for your listeners. Absolutely. [0:34:02] JMC: I will. I will. Then, what are your expectations about re:invent? It's taking place next week. What kind of announcements? I know you can't give us any scoop about that, but what is the nature of the feeling that AWS, that things will be announced over there? What are direction, or what direction would you like the company to follow in your own words? If you were Werner, if you were Wener Vogels yourself, where would you take AWS? [0:34:27] MM: If I were Wener. Oh, wonderful question. I mean, re:Invent is such a special event for everybody at AWS. There's so much work that goes into it. But we love the fact that it brings all of our customers together. It's our best chance, or one of our awesome chances to really sit down and talk to customers in workshops and in chalk talks and keynotes and really generate this dialogue with them to bounce the ideas off of, find out what they're interested, what problems are they're solving, and for customers to learn from other customers. There's a lot of sessions where our own customers are on stage talking about, "Hey, I used AWS in this way and I solved these problems." Really for me, it's, I mean, yes, the announcements and the new stuff are always super exciting and customers get pumped about that. For me, it's about us learning from our customers and customers learning from each other as part of this giant conference. I think that's really the most special part for me, because obviously, I don't know all the details about what we're announcing, or where we're taking it, but there's always lots of really great stuff that's announced and really spectacular innovation that's happening. I have no doubt, we're going to do the same thing again this year. [0:35:33] JMC: Yeah, the announcements take all the headlines. That's true. You're absolutely right. One pays for re:Invent ticket, or any other event that is similar to that. Because of the use case cross pollination, yeah? You're there to hear from, or spy on competitors using AWS, right? Or just to listen from providers, listen from your own clients, because everyone's using AWS. You'll find a whole ecosystem there. Your whole supply chain in a way, digital supply chain, in this case, at AWS re:Invent, because it's a massive event. Yeah, the cross-pollination of ideas is, I think, the main reason why one would purchase a ticket for that, because it's fascinating. It's absolutely bubbling in the air. [0:36:15] MM: What's great is, regardless of your industry, your company size, where you live on the tech stack, I mean, there's representation from everyone there, which is what's super interesting. You always find a peer that you can learn from. [0:36:29] JMC: Then for yourself, what are you going to work in the next - Mike Miller, director of AWS product, what is what you're working on in the next few months to a year that the audience would like to know? [0:36:40] MM: Lots of exciting stuff, primarily around generative AI and related topics, because just given the experience and where things are happening. My role is really to help our individual teams just innovate faster and think about cross-pollinating across the company. How do we connect people and connect ideas and think about what can we do with generative AI to innovate on behalf of our customers? That's really where my focus is. [0:37:05] JMC: Do you see any particular vertical pulling from the gen AI capabilities that AWS is offering right now, especially strongly? Is there any particular vertical that you're surprised by that is using these capabilities in a incremental fashion that it's like, I didn't expect this from the financial sector, for example? [0:37:27] MM: I think we'll have an opportunity to see some of that at re:Invent. I think, to be honest with you, what's surprising is the breadth. It's actually like, everyone is thinking about it and wondering how it can be used to improve their business. I think that's where Bedrock and these offerings really come into play, because it's a cloud-based foundation models as a service offering that you can fine tune, or use RAG customized, and it's not one model to rule them all, right? We've got a wide variety of models from first-party and third-party providers that do different things. We've got multimodal, we've got text, we've got embeddings, all of those things are going to be represented in Bedrock, which is going to be super interesting for our customers. [0:38:11] JMC: Well, Mike, thanks so much for joining us today. [END]