EPISODE 1624

[EPISODE]

[0:00:00] ANNOUNCER: This episode of Software Engineering Daily is part of our onsite coverage of AWS Reinvent 2023 which took place from November 27th, through December 1st in Las Vegas. In today's interview, host Jordi Mon Companys speaks with Ankur Mehrotra, who is the director and GM of Amazon SageMaker.

This episode of Software Engineering Daily is hosted by Jordi Mon Companys. Check the show notes for more information on Jordi's work and where to find him.

[INTERVIEW]

[0:00:37] JMC: Hi, Ankur, welcome to Software Engineering Daily.

[0:00:40] AM: Hi, Jordi. Thanks for having me.

[0:00:42] JMC: So, we are here at AWS Reinvent 2023. We are containerized. We are literally inside a container, which is quite fun. Is this your first Reinvent?

[0:00:54] AM: It's not. So yes, this is probably my I would say, fifth, fourth, or fifth Reinvent? Yes. But it's always super exciting to be here and meet with customers, all the launches and celebration. So, yes, it's awesome.

[0:01:10] JMC: So, what is your charter at AWS?

[0:01:11] AM: Yes. So, I am the general manager for an AWS service called Amazon SageMaker. And SageMaker is our service for end-to-end machine learning development. And I've been at Amazon for about 15 and a half years, worked across both our consumer side of accompany, amazon.com, as well as AWS. And before managing the Amazon SageMaker service, I was managing a few of our AI services, including AI services for personalization and forecasting, edge machine learning, health AI, and some of our programs such as DeepRacer, so it's been quite a journey.

[0:01:53] JMC: So, your experience is actually perfect for this question, because I know that this question is going to be not necessarily wrong, but limited. Because Adam, the CEO of AWS said in the Tuesday, in the opening keynote, that AWS has been investing in AI for ages now. I think he said something like 20 or 25 years. Me, not being an expert in AI, in that field. I've been aware and exposed to it. But my experience comes from DevOps, software engineering, application delivery, and so forth. I would trust that statement that the CEO made was true, because I know for a fact that SageMaker is an AI or an ML product that has been there forever, at least in terms of AI, and the age of AI has just started. But I think your experience will tell me that yes, Jordi, of course, SageMaker has been there for six or seven years. But AWS has been doing more before that. Right?

[0:02:45] AM: Yes. So, it's interesting. It is true that Amazon has been doing machine learning, since quite some time now over two decades. In fact, one of the earliest machine learning-based feature that was launched on amazon.com, about 25 years ago, was this feature called customers who bought this also bought that. Yes, that was just the beginning. And since then, Amazon has been investing in machine learning-based solutions and solving different problems using machine learning. Through that process, we've learned a lot. Whether it's the undifferentiated, heavy lifting involved in taking data, building models, or the operational challenges with deploying and scaling models at scale, and because of those learnings, we were able to, over a period of time, really developed the technology to reduce the burden on developers who want to do machine learning.

And what we found was that a lot of our AWS customers were also facing the same challenges. So, they asked us like, "Hey, can you help us solve some of these problems?" That's what led to us building Amazon SageMaker.

[0:03:56] JMC: I don't know if Jeff Bezos ever defined in that such way, but the flywheel effect has such a strong component of dogfooding that it's absolutely brilliant that you guys dog food this for yourselves. So, you had your internal clients requesting a lot of data training operations behind that, and yet, at the same time, the market your clients were demanding there. So, it was the perfect mixture for you guys to deliver eventually SageMaker.

[0:04:24] AM: Absolutely. We felt the pain as we were - over the years, as we were doing machine learning at Amazon and that led us to solve problems for ourselves. And then, we thought, well, why not solve it for others as well, and for AWS customers. So, it's been a great journey. Today, Amazon SageMaker is used by tens of thousands of AWS customers, and we're also super excited about our capabilities that we continue to launch.

[0:04:52] JMC: So, before we move on to what you announced because there's a lot to unpack about SageMaker at this AWS Reinvent. In fact, I would argue it was like 10% of Adam's keynote on Tuesday at least, and he talked about everything, literally everything that's going on in AWS which is uncoverable by a podcast or an episode, which is a lot. But first, at breakfast, Monday morning, I sit with someone that I don't know, and the guy talks to me about that he has been using SageMaker for years now. I think three or four, to train the data pipelines that his company is building and feeding with computer vision data. So, before we move on to the latest and greatest of SageMaker, give us a lay of the land of what problems and what capabilities and use cases has Sage Maker been solving for before the Gen AI revolution.

[0:05:42] AM: Right. So, if you look at the machine learning workflow or lifecycle, there are really steps involved in that. Right from taking the data, preparing it, or transforming it into a format, which you can use to build machine learning models. And then, comes the phase where you want to build a machine learning model, there's a lot of experimentation involved there, and prototyping until you have something that you really like. Then, taking that model, and then deploying it so that you can do inference and use the model in an application. Then, once you've deployed it, there are processes that you want to automate, right? Because you may want to update the model with more data, or you may want to automatically deploy any updated model.

So, ML ops in that process becomes really important. And then recently, ML governance has become also very important as machine learning is being used to solve more and more mission-critical problems, how you make sure that the model is doing what it's supposed to do, is also become important. So, there are many steps in the machine learning lifecycle, and what we realized was that each of these steps require tools that are purpose-built for that step or for that task. That's what we've done with SageMaker. We've worked backwards from the problems or the challenges our customers face at every step of the machine learning journey, and then build tools to solve those problems, and build them in a way where they all work with each other.

One of the great things that our customers love about SageMaker, is that the whole is greater than the sum of parts. It's well integrated together. One of the feedback that we got from our customers is that a lot of times when they're using bespoke tools, these tools don't talk to each other, and our customers have to spend a lot of time trying to connect these tools so that they can work together. So, that has been a problem, a key problem we've solved with SageMaker, and that has really resonated with our company. 

[0:07:47] JMC: Yes. I mean, one of the slides that Adam shared was the just logos of companies using SageMaker, and it was patterned from just a glance that there were not only huge companies. That for me, it's not that relevant. But representatives of many different verticals, like the variety of data types that SageMaker is digesting and presenting in models is just amazing, from healthcare, banking, finance, insurance, and so forth. So, it is definitely used in a wide variety of verticals, if not every single vertical out there. Then, that was the classical bit of history of, and a description of how SageMaker works. What has changed with the introduction of LLMs? That's a question. What has changed in this description of the landscape in which SageMaker operates?

[0:08:34] AM: Yes, that's a great question. So, what's interesting is that there are some things that remain relevant, even with the Gen AI revolution, or whatever you may call it. So, for example, given the size and complexity of these models, scalability, performance, and also the security aspects of how you do machine learning, that is more important than ever before. What's interesting is that with SageMaker, those are some of the things that we have been investing in since day one. When Gen AI models started becoming popular, we were from a capability perspective, we were ready in many ways. 

At the same time, there are also new things that customers need with Gen AI models and new kinds of tools to manage the end to end LLM workflow, if you will. So, for example, as these are models that already exist, and are the customers want to use them as is, deploy them, and then use them in a scalable way. Or customize them, fine-tune them, augment them, and then use them. So, one of the questions that our customers have been asking us, like, "Hey, how do I help decide what model to use? There's so many." For that, we thought about, "Okay, well, how do we solve this problem?" And one of the launches that we'll just talk about, helps solve that model evaluation and selection problem.

[0:10:01] JMC: So yes, before we will jump into that, which is a very nifty feature that you guys have launched, then we will talk about it. Is it fair to say then that the models itself, large language models are more data-intensive and more difficult to operationalize than the more traditional ML models that we've described a minute ago? Is that a fair statement? Do you reckon? Are the orders of magnitude bigger in those terms than the ones before? And that in itself is a huge challenge for which, well, AWS was ready or not?

[0:10:33] AM: Yes, that's correct. These models are built with billions, or 10s of billions, or even hundreds of billions of parameters, and they're much larger in size. They require the more compute intensive. So yes, they're more both in terms of scale and complexity and resource needs. They're more demanding.

[0:10:51] JMC: Exactly. This says, again, I'm not an expert, like I said at the beginning and this problem that I'm about to describe I wasn't aware of. But it turns out that during the training process, the training process leverages an incredible amount of accelerators, whether the GPUs or not, is irrelevant, or maybe it's not you tell me. But let's say that they're all GPUs. If one of those falters, all the process might be messed up. So, I didn't know that. I thought there would be built-in resiliency to those processes. And you guys have provided me with one of the announcements that you would - please describe in better detail what the problem looks like, the one that I just butchered in my description, and what did you announce to solve that?

[0:11:33] AM: Yes, absolutely. So, first of all, model training is not new. But with Gen AI models, given the scale at which Gen AI models are trained at, there are certain challenges that only appear at that scale. So, for instance, Gen AI models, given the size of the data used to train them, and the models themselves, don't fit on a single GPU or a single accelerator, right? Typically, you need to use a cluster of accelerators or accelerated instances to train them. And some of these models use hundreds or thousands of accelerators.

[0:12:09] JMC: - because the scale of this is just unfathomable. It's incredible.

[0:12:13] AM: Yes, it's crazy. So, the first challenge that our customers run into is like, "Hey, how do I efficiently distribute the data and the model across this cluster of accelerators, and how to do that efficiently so that I'm able to accelerate my model training, and also improve utilization of all the resources that are available in the cluster?" That sort of becomes a challenge, where how do you partition the data in the model, distribute it? How do you make these accelerators talk to each other?

[0:12:47] JMC: Coordinate, yes.

[0:12:48] AM: So, think of it also, as a traffic management problem. If you can optimize traffic management within the cluster, you can actually optimize model training or make it go faster. So, that is one key challenge. Then also, these are models that continue to be trained, or need to be trained over a period of weeks or even months. So, to train these models, you often then have to pause, inspect, optimize your code, and then start from where you left off. So, you have to also continue to save your progress as you go.

Version control, maybe. It could be good, in a way, yes. We call it checkpointing.

[0:13:26] JMC: Yes. Snapshots, checkpoints.

[0:13:29] AM: Then, because of the size of the cluster that's used to train these models, invariably, some of the other infrastructure issue or let's say, an accelerator failure, can really disrupt your entire training process. Then, you have to spend time, instead of actually research and development, you have to spend time going and troubleshooting infrastructure issues, fixing them, and then finding ways to then restart everything, right? You end up losing a lot of time and resources. Right now, time to market is everything with these models. So, if you're losing time, that can have a big impact on your business. We looked at these problems, and we've tried to work backwards from them to launch Amazon SageMaker HyperPod which we just announced this week. So, happy to tell you more about that.

[0:14:18] JMC: Yes, go ahead. So, HyperPod is trying to solve this problem that you just described. Right?

[0:14:22] AM: Yes.

[0:14:22] JMC: So, how does it work?

[0:14:24] AM: Yes. So, first of all, HyperPod makes it easy for you to set up this cluster of accelerated instances or accelerators, and it's got - we've integrated SageMaker's distributed train libraries in there, which make it easy for you to distribute, or partition data, and model, and paralyze model training efficiently across your training cluster. That helps you speed up model training and improve the utilization of the resources in your cluster. It helps you continue to save your model progress. So, checkpointing as you go. It gives you tools such as Amazon SageMaker profiler and TensorBoard, so you can debug model performance as you go. You can look at, "Hey, where am I wasting GPU cycles? Where can I optimize my code?" So, all of that information you have at hand for the improved model training.

Then also, SageMaker HyperPod, automatically monitors the health of the cluster all the time. It can automatically detect if there is an infrastructure issue. Let's say, an accelerator failed, and it's able to automatically determine why it failed. What is the type of the error? Could this be fixed by just restarting the instance, right?

[0:15:42] JMC: Which is very typical.

[0:15:43] AM: Fairly typical. Or do I need to actually just replace this faulty node, right? It does the remediation, whatever action needs to be taken, it does that automatically. And then also, automatically resumes the model training process. So, in some sense, you get a self-healing training cluster, and this provides almost like a zero-touch training experience for training these models.

[0:16:07] JMC: I was interviewing in the same studio three hours ago, John Willis, and when I asked him about the future of LLMs and so forth, he refused to say anything, because he said he was not a wizard, that he doesn't predict the future that much. Now, he said that in the past, he had come across this wave of ML Ops that we were describing before, and now AI Ops, if you wish, that you just described. He said, "Look, Hadoop, for example, is still and was definitely in the past. That fantastic project or product, if you wish. But even the crack SF team, the best team, and the best bank out there, would have an incredible amount of work, and if not, an innumerable number of problems, to set it up, to keep it running, to expand it when needed, to contract it when not needed, to make it idle, and so forth." So, the fact that SageMaker is able to operationalize this in an automated fashion, like you just described, is just incredible. So, I could see how that problem is going to delight many of your customers.

[0:17:06] AM: Absolutely. Look, there are many things about AI that we cannot predict right now. Given how fast things are changing, right? But there are things that we know will continue to be true. So, for example, I don't think - we know that the models, the Gen AI models that exist right now are the worst they'll ever be. They're only going to get better. We know that none of our customers is going to come to us and say, "Well, I wish it was harder to train these models. Or I wish it costs more to do my work on SageMaker." So, there are dimensions along which we know that we need to continue to improve on. So, we continue to innovate on those.

At the same time, we're always listening to our customers and looking at where the space is growing, where the innovation is, and we're proactively also building features to support the latest and greatest in the space.

[0:18:04] JMC: Talking about models, one thing that I have realized, and I'm not necessarily in your position, listening to customers. But it's quite obvious from a user perspective like an individual user, but I'm sure that the corporations and all the AWS clients are asking. It is confusing right now what model to pick. Not necessarily confusing. It's not easy. So, I think that you've also announced a feature of SageMaker, a new feature, that will help users of sage maker pick the right model for them, right?

[0:18:33] AM: Yes. So, we've announced Foundation Model Evaluations as part of SageMaker Clarify. So, our customers told us that the choice of models, like you mentioned, is increasing. Also, when you change these models by, let's say, fine-tuning them, the modeled characteristics can change. So, how do customers figure out what's the best model to use when and for which use case, right?

[0:19:01] JMC: That is difficult.

[0:19:02] AM: It is difficult, and some of the public benchmarks that are available for some of the pre-trained models, they are sometimes too academic in nature. They don't always reflect, they're not representative of your use case, right? The data that you may put into the model. So, we work backwards from this problem that our customers were facing, and we've launched this new capability. So, the way this works is that you can pick any model, either one of the pre-built models within the discoverable within SageMaker, or any other model, and you can select the dimensions along which you want to evaluate these models. So, this can be things like accuracy, toxicity, bias, semantic robustness, et cetera. Then, you can either use built-in public prompt datasets to evaluate these models, or you can provide your own prompt data set.

So, for example, if you think that the application that will use these models is going to provide unique input, that is not captured by other public datasets, then you can provide your own prompt datasets. Then, this capability then automatically runs evaluations with those prompt datasets for those dimensions, for these models, and then generates a comprehensive model evaluation report that you can refer to, and that will show you how the model performed on each of those dimensions. It'll show you what were the prompts for which the score was the lowest. Then, that can help you select the model that you want to use.

In fact, there are also things that are hard to evaluate automatically. So, for example, if you're building a chatbot, you may want the chatbot to be more friendly, or less friendly. You may care about the tone, or some other creative aspects of the model. So, these things are hard to evaluate automatically. This capability also now supports human evaluations, which is powered by SageMaker Ground Truth, which is an existing SageMaker capability for human in the loop machine learning. So, you can provide instructions for how the output of the model should be evaluated, and you can either provide your own workforce that can use SageMaker Ground Truth tools to provide feedback and create this evaluation report. Or you can also use a manage workforce option that SageMaker Ground Truth provides, where you don't have to even bring people to provide feedback. SageMaker handles that, automatically.

[0:21:36] JMC: Okay. Wow. I didn't expect that last bit. So then, clients that want to approach incorporating LLM models to their own applications or services, or the ones that have their owns and want to improve the operations bit of it. I mean, SageMaker and these two features that were just announced here, those that have trouble figuring out what the correct model is, they can do so in the fashion that you've just described. Those that have problems, scaling, and having a reliable infrastructure during the training and the snapshotting, and the evaluation of those, each one intermediate bits have a solution with HyperPod.

What else have you announced? I know you've got another link to the show notes, a press release with a myriad in a very AWS fashion, and an overwhelmingly good fashion. You've announced way more things. But there was another thing that caught my attention that we've spoke before the interview. What was it?

[0:22:31] AM: Yes. So, we've talked about training. We've talked about model selection. Now, when it comes to model deployment, we looked at what are some of the unique challenges our customers are facing with Gen AI models. So, what we found was that our customers today take a model and they deploy it on - they spin up an instance and deploy it on an instance, and they can set up auto-scaling rules for those instances. But oftentimes, there are more accelerators available on the instance, then a model actually requires.

So, you may deploy a model, let's say that needs four accelerators. But some of our, let's say, before instances have eight accelerators on them. That can lead to some of the resources or accelerators being wasted. Now, we've launched new capability, where you can keep adding models to one inference endpoint, and SageMaker automatically arranges or allocates accelerators to multiple types of models on the same instance. It's kind of like, in the background, it's playing Tetris, with these models.

[0:23:37] JMC: To fill all the gaps.

[0:23:39] AM: Fill all the gaps, and then also as - and you can keep doing that with more and more models. And what SageMaker will do, is based on the traffic that each model is receiving, it will dynamically load and load models. It can evict the model that's not being used right now, and then it can also auto-scale at a per-model level. So, let's say if model A, the three models on an instance, model A, B, and C, model A is getting more traffic or inference requests, it will spin up another instance automatically, but will only start loading model A, rather than wasting resources by also loading models that don't need to scale up. So, what we found is helping our customers reduce cost to deploy these foundation models by 50%, on average. So, that's a huge cost saving.

[0:24:28] JMC: Especially, if we consider that the inference part, training models, deploying them, takes a lot of the attention. But the usage of models, the running, the inference, is actually the - it is the outcome of a model that you're interested in, or your clients, right? And that is a completely constant request for resources in the backend. That is a really consuming effort. Optimizing must be the obsession of everyone.

[0:24:54] AM: Absolutely. And cost matters here, and this is what this capability provides. In addition to cost, another problem that solves is latency is also important, right? Because if you're building an application that is interactive in nature, you want it to be snappy. You want the inference request. You'll have low latency. What happens with Gen AI models is, that with predictive models earlier, the latency for inference were more or less predictive. Making a prediction, roughly, the inference latency would be in a particular range. With Gen AI models, it really depends on the kind of task you're giving to the model.

So, for example, if you asked it a question that requires only a single line answer, latency will be fairly low. It just spits out an answer, and it's done. But if you are asking it to write a blog or a poem, then it can take much longer to generate a response. So, the traditional way of randomly routing inference requests to different instances, can lead to unpredictable latencies. Because you may end up routing it, an inference request to an instance that's maybe processing an inference request that's going to take longer.

As part of these inference capabilities, we've also launched a new smart load-aware routing capability, where SageMaker automatically monitors the load on each of the instances that are behind the inference API, or the endpoint, and it automatically routes it to an instance that is either idle or will spend, or is going to be pretty quickly to generate a response. On average, that reduces inference latency by 20%.

[0:26:35] JMC: Wow, that's fascinating. This next question is a bit beyond your remit, your charter here at AWS, not here, but at AWS's SageMaker. And the beauty of SageMaker is that it abstracts you from picking and knitting together a solution, not only a custom solution, but also within the myriad of services that AWS offers, right? SageMaker is probably a combination of those. This would be my question. What are of all the different hardware announcements that were made here at AWS Reinvent 2023, software announcements, foundational models, which ones are the ones that SageMaker is leveraging the most? Which one is sage maker wrapping? I know SageMaker is a product in itself. But which of the services announced here are the ones that is taking more advantage from?

[0:27:27] AM: Yes. So, let's talk about hardware announcements. So, we've announced new hardware options, which will be available next year. This includes new Nvidia GPU options as well as Trainium 2. So, with SageMaker -

[0:27:42] JMC: Trainium, let's remind everyone that it's an AWS built.

[0:27:45] AM: Yes, it's an AWS. At AWS, we've designed Trainium for machine learning, and it's optimized for machine learning.

[0:27:52] JMC: That's a bold bet.

[0:27:54] AM: Yes. And we announced that we will be launching pr Trainium 2 next year. So, with SageMaker, we really believe in the providing choice of hardware options, compute instance options to our customers. As these hardware options become available on AWS, you can expect that they'll also become available as part of SageMaker for sage maker customers as well.

[0:28:19] JMC: Okay. So, what is next for sage maker? I mean, you've announced a lot and these features need to be used. They just have been launched. I know it's difficult, and we've spoken about it a minute ago to make predictions. Verner said it this morning, everyone says it's clear to everyone that we're in the infancy of the stage of the revolution, the Gen AI revolution. But what can you tell us as the GM of SageMaker that is going to happen in the next six months for SageMaker in particular?

[0:28:47] AM: Well, there are many other things we're working on. So, for example, also, one thing that we didn't get to talk about today is SageMaker Studio, which is our single pane of glass for doing machine learning. We've launched just yesterday, a completely new redesigned version of SageMaker Studio that is much faster, and has new IDE options. We're also focusing on developer efficiency.

In addition to these improvements, we are continuing to work on enabling more Gen AI-powered development experience. Now, SageMaker Studio comes with CodeWhisperer built in, which our customers and developers can use to auto-generate code, and code recommendations with natural language. We've got JupyterLab as an IDE option within SageMaker Studio. Now, we have Jupyter AI integrated in, which provides a chat interface through which you can connect it to any Gen AI model to do, execute other code-related tasks such as explaining code or debugging code, et cetera. I think you can expect that will continue to enhance the developer experience and make it more efficient. We improve developer efficiencies through Gen AI-powered development experiences.

[0:30:04] JMC: From the conversations that you've had with clients here and not necessarily here on site, but in general, lately. Any new use cases in which SageMaker has been applied for that have surprised you or that you didn't expect coming? Any new verticals that are onboarding the Gen AI experience with SageMaker?

[0:30:24] AM: Yes, absolutely. So, with Gen AI models, we've all seen how broadly applicable they are. They can solve all kinds of different problems and can be used for different tasks. But we're also starting to see more vertical-specific models being created. Many of these are being created using SageMaker. This started with Bloomberg creating the BloombergGPT model, which is optimized for financial services and that was trained on SageMaker. We have other healthcare AI startups who are creating more healthcare specific large language models on SageMaker as well. In fact, Hippocratic AI is an AI startup that they're building a healthcare LLM, and they've actually been building it on SageMaker HyperPod. They've been a great customer. We're starting to see more, both task-specific as well as vertical-specific Gen AI models being built, and we're super excited to see all kinds of other problems our customers will solve with Gen AI. 

[0:31:27] JMC: Yes. Again, going back to the ability to pick the right model, when that barrier is pushed down, then what we've got is a plenitude of options for which we will eventually, or SageMaker will eventually help us pick the best choice, and then fulfill the specific tasks that the vertical in which our company operates has to provide a service or a company. So, it's absolutely brilliant. To be honest, I think SageMaker is one of the killer products that AWS has, and AWS has plenty of products out there. But yes, the fact that it can - it's almost an end-to-end solution. I mean, I don't have SageMaker actually helps collect data. That will probably be the only area at which have the ML opposite, sort of like lifecycle, in which I don't know if SageMaker actually does anything, but the rest is completely covered.

[0:32:21] AM: Yes. Well, we do have SageMaker Ground Truth, which helps customers label and annotate data. We also have data preparation tools, such as SageMaker Data Wrangler, for data prep stage of the machine learning lifecycle. So, we do have some data-specific tools as well.

[0:32:38] JMC: And it's a complete end-to-end solution and it's absolutely brilliant. So, I'm really happy that is prepared for the scale, you were already prepared, of course, as you mentioned, in the background of - it's in the DNA of AWS by need, and by philosophy of the company providing solutions for the largest customers in the world. So, I'm really glad that you joined us for today's episode. If anyone wants to know more about SageMaker, where should this person look for more information?

[0:33:06] AM: Yes. So, for our latest releases, launches, we have a press release that we've published. So, I encourage folks to look at that. Then also, feel free to just go to the AWS website and browse to the SageMaker page to learn more about our features.

[0:33:22] JMC: Ankur, thanks so much for being with us. All the best with this product.

[END]