EPISODE 1808 [INTRODUCTION] [0:00:00] ANNOUNCER: Compute optimization in a cloud environment is a common challenge because of the need to balance performance, cost, and resource availability. The growing use of GPUs for workloads, including AI, is also increasing the complexity and importance of optimization given the relatively high cost of GPU cloud computation. Jerzy Grzywinski is a Senior Director of Software Engineering and leads FinOps at Capital One. Brent Segner is a distinguished engineer at Capital One and is focused on performance engineering and cloud cost optimization. Jerzy and Brent joined the show with Sean Falconer to talk about methods to measure compute efficiency, horizontal versus vertical scaling, how to think about adopting new instance types, the effect of different languages on compute efficiency, and much more. This episode is hosted by Sean Falconer. Check the show notes for more information on Sean's work and where to find him. [INTERVIEW] [0:01:04] SF: Brent, Jerzy, welcome to the show. [0:01:07] JS: Yeah, thanks for having us. [0:01:08] BS: Thank you. [0:01:09] SF: Yeah, awesome. Thanks for being here. I'm really excited to get on this topic. I think there's going to be a lot of practical, actionable advice for the listeners in regards to cloud optimization. Since there's two of you, just to help the audience out a little bit with whose voice is whose, can we have you introduce yourself? Who are you? What do you do? Brent, let's start with you. [0:01:29] BS: Good afternoon. My name is Brent Segner. I'm a distinguished engineer in the cloud evolution team over at Capital One. I've been with Capital One now for just over two years, focused on all aspects of cloud cost optimization, performance engineering, but probably had about two decades in this field. [0:01:46] SF: Awesome. Jerzy, same question to you. [0:01:48] JG: Yeah, I'm Jersey. I'm a bit of a longer-timer at Capital One going on 16 years. I've done many different engineering roles from backend to frontend development. Currently, I lead FinOps for Capital One, where Capital One's been all-in in the cloud for several years now. And as our footprint has grown in the cloud, we've really wanted to focus in on architecture, standards, engagements, and then tooling for how we help our developers be more efficient and effective in the cloud in order to maximize the value. I spent a lot of time with Brent and other members of my team to build tools and go on that journey for the company. [0:02:27] SF: Awesome. What started your interest, I guess, in FinOps and what's kept you interested enough to keep at the same company for 16 years? [0:02:37] JG: Yeah. For me, what I really like about my journey at Capital One is I've had, I don't know, 10 different jobs without leaving the four walls of Capital One, which has allowed me to explore different areas of the tech stack and different areas of interest, either from a technology perspective or product problem statement perspective. That's really kind of what kept me inside these walls and the culture of the company and the fast-moving evolution of how we use the technology has been a fun ride. And really, for getting to FinOps, I was just telling someone the story that I still remember sitting at the dinner table with my father back in my high school days of what will I go to college for. I always knew I probably would be an engineer, but I really wanted to go into business because of my interest in finance. I kind of just chuckled off, I said, "I know I can be an engineer and do finance later. I don't think I can go into finance and be an engineer." And many years later, sure enough, technology brought me to something that has a finance problem statement, but also a technology problem statement. That's really what keeps me energized in the current role. [0:03:41] SF: Awesome. Brent, did you have anything to add about your sort of journey into FinOps? [0:03:44] BS: Well, I was gonna say, ironically, I actually took the exact opposite path that Jerzy did. Just over two decades ago, I started out in finance. Quickly discovered that I had a knack or an interest for technology. And that kind of brought me to the journey to where I am today, specifically my interest in Capital One. Capital One was intriguing to me, just given a financial institution of its size actually going all-in on the cloud. You don't see that happen very often. And then since starting here, what's really kind of stood out is just the fact that they're continually tip of the spear as far as technology, just trying to be able to go through, push the existing boundaries. [0:04:22] SF: Yeah, I think even as an outsider, I think when you look at companies, especially in sort of financial services space that are these earlier adopters of technologies, like Capital One is one that stands out to me that I've seen over and over again in my career. And you mentioned the cloud migration. I think your cloud migration kind of wrapped up around 2020. What was sort of the original motivation to move to the cloud? And what were perhaps some of the key learnings from that? [0:04:49] JG: Yeah, absolutely. I'll take this one as Brent was not quite here with us during that period. Capital One really started on a transformation journey, a tech transformation journey. And actually, not too long after I joined, 2010, '11, '12, we really wanted to pivot from off-the-shelf tools and capabilities to becoming a technology company that builds the capabilities, is able to innovate and really drive impact for our customers through innovative solutions. As the company started going through that journey, it became evident, in order for you to start building the best tools, you need to have the best talent. In order for you to have the best talent, you need to have an environment that attracts the top talent. And the cloud was clearly eating up the world for where the talent wanted to go. Startups were mostly on the cloud leveraging. Capital One decided, that's the direction we want to go. We want to be nimble and fast. We want to attract the best talent. And why does that talent want to work in the cloud is because it gives access to being able to work on technical problem statements versus paper-pushing process that sometimes occurs when you have to have boundaries between the folks that can develop versus the teams that can provide infrastructure and support that infrastructure. As Capital One went through that arc of tech transformation, we went from the Waterfall to Agile, we went from on-prem to the cloud, all in the name of building great capabilities for our customers with great talents and that nimbleness to getting faster types of market flexibility, et cetera, that the cloud really offers. [0:06:25] SF: In terms of faster time-to-market, what kind of impact did this have on the speed of execution that the engineering team was capable of versus maybe what was there before? [0:06:36] JG: Yeah. Before we started on the journey, I was working in an infrastructure team that really was facing off with the application teams to understand what they needed from the hardware perspective. I helped architects what servers, database, network, et cetera was needed. And then I also had a team that helped execute on that. And the time-to-market for someone to say, "I need a server," to they are actually able to deploy code on that server, let alone that server be able to talk to anything else, was measured in weeks, months, and definitely not hours and minutes. For me, as I went from supporting that role prior to the cloud migration to then shifting over to full stack development and being the application team member asking for infrastructure, the cloud obviously unlocks servers in seconds. Network is based on, again, API calls and configurations and code. There was real speed of taking away a lot of process of just making something happen. Now, of course, through that journey, there were a lot of additions to that. And FinOps, I think, is one space where most developers didn't think about how much they were spending on servers during data center days. Our goal is to get developers to think about what their spend is. Because now they do have access to provision and build things in a way that can be efficient or can be very clunky. And really, if you're designing things in a way that actually doesn't think about effectiveness and efficiency as a measure of success, the cloud can become very expensive. And it can also bring a lot of challenges that a developer may not be happy to be spending time on. The cloud certainly unlocks so much value. But with freedom, that also brings a lot of responsibility back to the developer, which then what we hear from our developers today is, "Oh, I have to worry about not only feature development, but I also have to worry about the other things." Our goal is how do we make sure they do worry about it? But it only takes up so much time of their day. How much can we automate on their behalf? How much can we build into the enterprise tools and standards, defaults and things like that? Developers can be engineering in an efficient way, think about it a little bit because there's no way to just completely take that away, but also allow them to, again, have that time-to-market, which brought us to the cloud in the first place, right? [0:08:58] SF: Yeah. And I'm curious, I think a lot of people, when they first start thinking about moving to the cloud, they think about it purely like, "Oh, this is going to save us money." But it doesn't necessarily save you money, especially if you just simply - you're lifting and shifting something that maybe wasn't designed to operate in the cloud. And then suddenly you're just putting on a gigantic single server and paying a lot of money to run that thing. Or you might run in these cases where you think like, "Oh, well, this is not performing at the level that we need. We'll add more and more memory." But maybe that won't actually solve the problem because it turns out that the thing that you're running is like single-threaded and it's not gonna take advantage of the fact that you have more memory available anyway. When you're in this place where you're trying to help people or engineering teams with an organization, optimize both from a cost perspective and also from performance, how do you, I guess, develop trust with those teams so that they don't see you as like the office of no? Like, "Why won't Jerzy let me scale this XL3 EC2 instance or something like that? He keeps saying that this isn't the right thing to do." [0:10:01] JG: Yeah. I would give my perspective. And I know, Brent, you have a few good examples here. For me, what's been helpful is it was in the shoes of the developer to build tools. Actually, we have teams in my space that build tools and are in the function of owning an application and delivering value. I think most importantly, we need to work backwards from how do we make the developer successful? Knowing they have full control. At the end of the day, my goal is how do I bring value to the developer, either through tools, automation, et cetera, or through engagement and information in order for that developer to be successful? And I feel like if we bring the relevant information to the user and provide the context for why we think we're right, and also be humble when we are wrong and take those learnings back to our tools, we build that trust. I think it's also very important to call out when we are wrong or when we know we have some shortcomings. At the end of the day, we're building tools that provide recommendations to 2000-plus apps. Each of those apps is slightly architected differently with different requirements, et cetera. There are certainly edge cases. We try to be humble in our recommendation to build that trust. We do try to celebrate the wins, of course. We try to limit a stick approach of how much we penalize, but we want to be direct and open about where we think opportunities are. We don't want to slow down just because we feel like the person might feel like we're delivering a hard message of their overspending or inefficient. But really, we're trying to be direct about the opportunities, but then bring solutions to the table. And I think Brent has some really great examples that I think he really has contributed some big wins that maybe, Brent, you can chime in with. [0:11:43] BS: Yeah. No. I was going to say just kind of expanding on some of the points you made. For us, a lot of it is can we take a look through the eyes of the developer to be able to make sure that whatever we're recommending from a cost-performance optimization perspective meets their needs. More often than not, developers, they care about their user experience, they want to be able to have an application that's resilient and responsive. We have to be able to make sure that when we're making a recommendation that we can measure, did it have a positive impact, both cost performance, on the developer's experience towards the end objective. We look at different things, measurements we could term as like utilization, saturation, errors, to be able to give us kind of an ability to be able to triangulate with data on, "Did the change that we recommended that they make have the desired impact? And if not, can we help them make a change that's going to improve their cost performance?" [0:12:42] SF: When it comes to these types of optimizations and people making choices about what they're going to run in the cloud, do people tend to over-provision or do they sometimes under-provision? [0:12:55] BS: What we've seen historically is, more often than not, the developer once again cares mainly about their user experience. They tend to be able to err on the side of over-provisioning the number of resources that they would actually need to be able to meet a very specific use case. We've tried to be able to help them by giving them better data to be able to understand what are the actual capabilities of the instances that they're looking at? And what one's going to be most appropriate for the use case that they're bringing to us? [0:13:23] JG: Yeah. And I think there's a common theme that we hear from users that my easiest way to be resilient from issues is to overscale. And we, with data, try to provide an understanding that sometimes that's a false sense of security. Sometimes if you gave an example of like you over provision servers, but really you're single-threaded through a single CPU, you actually don't have the security that you think. In the mission statement, to try to be more efficient, we also very much try to uncover where your performance both from capacity or even speed for how fast your application is running might not be quite there. It could be a win-win-win solution if you do become a little bit more efficient by downsizing. [0:14:04] SF: Yeah. You've developed some interesting approaches to measuring some of the compute efficiency. Can you explain how CoreMark benchmarking works. [0:14:12] BS: Yeah. This is one of my passion projects. I guess giving kind of we'll go two steps back before we go one step forward. Just understanding kind of where CoreMark came from. There's been a number of attempts over the years to be able to come up with a single unifying number that can actually encompass what is the capability of that CPU of that instance. Some of the steps that came before, Dhrystone and Whetstone, and yes, I get the irony of the two names, did a fantastic job stepping towards a unifying number, but ultimately fell short because they didn't actually encompass the realities of a workload as it would operate on an instance. I think it was 2009, the Embedded Microprocessor Benchmark Consortium, and that is a mouthful, I'll say EEMBC, first published CoreMark as a consolidated group of synthetic tests that would actually measure nine distinct types of operations on a CPU. These operations range from different things. We'll say linear algebraic functions all the way through image renderings. But ultimately, what it did is you take the composite of those nine different types of functions that can be formed on a CPU and it gives you a single unifying number we refer to as a multi-core score. That multi-core score allows us to be able to compare apples to apples across instance sizes, instance families, clouds, and even down to bare metal hardware residing in a physical data center. [0:15:45] SF: What is this multi-core score? Is that a number, 1 out of 100? What's it actually look like? [0:15:52] BS: It would be a score probably in the range of one out of hundreds of thousands. Each of the individual tests, linear algebraic test, or an image rendering test, or a compression test is each assigned a score based on the efficiency and time it took to be able to execute X-number of operations. At the end of running one of those synthetic tests, every one of those nine operations receive its own score. It's then combined across the nine tests to be able to come up with that single unifying score. Super interesting about this is you're able to start to be able to tell how a score would scale as you increase instance sizes or even change instance families to be able to get different allocations of CPU, the memory, the I/O, et cetera. [0:16:45] SF: And can I factor in the type of application and consequently the type of operations that are going to be running and factor that into my capacity planning and optimization? Because if I'm looking at the aggregate metric, maybe the things don't look that good for the thing that I'm planning. But I really only need to perform linear algebra operations or something like that, and that's a different type of optimization that I'm going to do for a general computer program. [0:17:10] BS: I love where you're going with this question. You're hitting nail on the head. What we started to be able to see as we execute the tests is, based on physical attributes or different libraries that are native within the CPU itself, there are certain instance types that are better performing different types of operations. Your overall multi-core score maybe a little bit lower for a particular instance type. But if you say, in your example, I perform mainly linear algebraic operations, this instance type may actually be ideal for what you're trying to be able to do. Given the combination of those nine individual tests along with the multi-core score, we use that to be able to help influence our development teams to be able to make the right decision when trying to be able to pick an instance type, an instance size that meets their requirements. [0:17:57] JG: In addition, we do some engagement types of recommendations, especially for large platforms. But the way we use CoreMarks at scale is we leverage that in our tooling. The user experience for a developer at Capital One is you log in to our internal tool and you type in your application and then we provide different recommendations for running these instance types and you could be running these different instance types. The uncover algorithm and the way we get to that recommendation is based on the CoreMarks and the ecosystem experience that we're trying to create is provide - not only we pull a bunch of utilization metrics that enriches our opinion of our recommendation, but we're also entering a phase of allowing users to also add additional information. Create a feedback loop to only improve the recommendations we provide, again, to build that trust that we're not just creating a general recommendation, we're making a recommendation just for you, just for your application, just for your specific architecture. [0:18:54] SF: And then if I have something that's running in production today, and then I find out that there's a better recommendation in terms of what I'm allocating in terms of resources, is there a way I can sort of test out that plan to see if it's actually going to continue to meet my needs from a user perspective while also maybe reducing costs and so on? [0:19:15] JG: We're definitely trying to get to a place where we automate away a lot of our recommendations, but we also recognize we want to provide accountability where it's owned. What we recommend in our tooling is step changes in your infrastructure. We never want to take you from 24 XL down to a small in one hop. We very much encourage performance testing, et cetera, the standard ways to get things into production and not just react and over-correct. We do recommend step changes. That might take several months just to kind of test these things out. Operational stability, we still find very important. Because if we fail once, people will start over-provisioning overnight and we lose that trust factor. [0:20:00] SF: Right. What were some of the surprising findings that you had in regards to instance size scaling and performance? [0:20:07] BS: There were, for us, a number of really interesting takeaways when we started to be able to run the benchmarking tests against like a broad cross-section of instance types and sizes. The one that probably stands out to me the most is how much the performance per CPU was impacted as we start to be able to move up the ladder of instance sizes. I think, for example, it was on maybe the M7i instance family, that as we continued to test and we grew past 16 VCPU, we saw a 12% performance hit per CPU. When we kept going up in size, eventually we saw another plateau where we hit another 13% performance hit per CPU. What this really came down to underneath the covers is, even though we're working with purely virtual instances that are masked by a hypervisor, that we're still bound by a lot of the physical laws of the actual physical hardware underneath. As we continue to be able to grow the virtual instance sizes, we actually pass physical boundaries on the hardware, like the NUMA boundary, where each time we pass those boundaries, small performance penalties applied. Individually, these performance of penalties are probably small. But eventually, as you pass so many physical boundaries on the underlying hardware, eventually you start to be able to accumulate them and ultimately you end up paying for a lot of performance that you're not really able to realize. Ultimately, I know that depending on the type of workload you're running, there are different realities that apply. But for us in general, a rule of thumb, it's a really good indicator of why we try to be able to encourage our workloads to remain smaller and be able to horizontally scale rather than always leaning on vertically scaling. [0:21:52] SF: Yeah, I recently did a podcast on supercomputing where they kind of talked about some of the stuff where the number of flops that most of these supercomputers can operate at far exceed the capacity, essentially moving data around in the physical hardware. Because if you're moving data from CPU, to cache, to memory and so forth, that's a higher cost essentially of replicating that data to those different places than it is to do the operations of floating points. Most programs aren't able to fully utilize the ridiculous gigaflops or whatever that are available now. [0:22:27] BS: Exactly. [0:22:28] SF: How do you factor in, essentially, the performance improvements that we see in cloud instances over time? Obviously, the hardware and the virtualization of this hardware continues to get better from where it was 5, 10 years ago. It's always getting better. How does that kind of get factored into your strategy when you're thinking about you continue with performance improvements and optimization? [0:22:49] JG: Yeah, that's something where we're actually really doubling down on as we've noticed that we're not having folks shifting to newer generations as quickly as we like. There is a little bit of a balance of the newer generation, the Cloud providers need to provision enough capacity. At GA, they certainly have X-amount of capacity for infinite amount of customers. At our scale, when an application is deploying a thousand servers at a time or more, we need to make sure that application teams are not running into insufficient capacity errors when they're provisioning instances, which we've seen to see more on the newest generation of instances. And ultimately all it takes is one occurrence for a team to back off and use an older generation instance. It is a balancing act that we are trying to now counter a little bit because we feel that folks are being a little too conservative and passing up on the benefits of newer instance types. That is an angle that we are tackling right now and more forcefully with our tooling to try to get folks to new generation of tools as Brent's data, the CoreMarks that we see. We definitely see the same thing that the cloud providers are advertising. Newer instances are faster, better. And if you also right-size them, are cheaper. There is the problem statement too that we are tackling is folks like to stay consistent. Maybe on the 5th generation instance, they're running a 4XL, they want to run the equal size of a newer generation instance, which could, in some cases, be more expensive, but also provides you far more performance. We really are trying to get folks not to get tied down to the name of the instance or the size of the instance, but the underlying performance that it provides. And that ties back to the CoreMark scoring, et cetera, that we were mentioning is, just because you were on a large on the instance from 2015 doesn't mean you have to be on a large instance from a newer generation. That is definitely the second challenge that we're tackling. Folks really just forget - don't forget, but don't take the time to understand the performance improvements for newer generations. And we're trying to bring tools and data to the conversation so folks can understand what they're getting and that they can use smaller instances in newer generations. [0:25:01] SF: Is there any, I guess, risk associated with being too much on the cutting edge of whatever the latest flashiest instances are? [0:25:10] JG: Yes, to a degree. And I think it's how you manage that risk. For us, the primary risk is potentially around those capacitors. If day of GA, you take the latest instance and you want to scale that is something to consider. But the way we're going to counter that risk is we really want to automate a way instance selection as much as possible. We would love everyone to use one of the latest generation instance types. But if that's not available, we automatically take you down to the next instance type in the right size if the size needs to be changed. And hopefully, you're not going n minus 2. But ultimately, if the capacity of the newest instance does have an issue, in a short period time, we can then mitigate that risk, but also promoting the newer generation of instances. We do want to do some of the proving out of the newer instances centrally. In general, the cloud providers do a nice job in making sure what comes to GA is market-ready. It's just more around that capacity piece. We feel like we should - our whole footprint on n minus 1 or newer generations of instances at our scale. [0:26:15] BS: Just to expand on what Jerzy said, because I really love this point, as we've watched over the last five years or so, and companies are definitely now moving all their workloads into the cloud, we've started to be able to see that all the cloud providers, as well as Silicon manufacturers, are really starting to be able to understand what types of workloads are running in the cloud and engineering around how to be able to have those operate most efficiently for both themselves as well as for the users. Take a look like architectural changes we've seen that are just groundbreaking. In the latest generations, the size of the L1 through L3 cache, we start to be able to see on a lot of the latest generations that the cloud providers are now disabling the SMT or the multithreading coming in. Both of these changes have had a huge performance boost while at the same time actually making the workloads much more predictable for the users. We took a look a little while back just to be able to benchmark across generations with ARM architectures. I think when we compared like n minus 2 to n minus 1 on generations, there was a 30% boost that's going to n minus 1. And that unto itself, if it stopped there, that would have been phenomenal. But then you compare like n minus 1 to the current generation, it jumped again by 20%. At the end of the day, when you compare current generation performance to two generations back, there is almost a 50% jump in performance. And at the same time, there's only been a nominal increase in cost. For our application teams, if we can encourage them to get to the current generation, it's a 50% performance boost, nominal cost, which means that they're actually getting a significantly better cost for performance on the current generations than they would if they stayed on an n minus 2 generation. Huge incentive to try to get them to the current technologies. [0:28:08] SF: Yeah, absolutely. How much does the language that you're programming impact optimization and your ability to control cost? [0:28:17] JG: You watched the re:Invent cost. Jerzy and I've had this conversation a lot. I made a mistake a little while back, I started to be able to talk very specifically about languages. And I very quickly learned, if you speak poorly of a language in one regard, it's like calling somebody's baby ugly. I try to be very careful when I'm addressing this topic. I know each organization and even developers have like very definitive thoughts on which languages are preferred. All I'm trying to be able to say is language selection, library selection, just plays a very foundational role influencing resource utilization, performance, scalability. For instance, for me, a language like Python is very easy to be able to use if you're working with different data science tools, NumPy, Pandas, et cetera. However, there are trade-offs when it comes down to performance. If you're looking for more performance than simplicity, then you may want to go through and take a look, "Okay, do I have the opportunity to be able to move to a Go or a Rust-based language or use libraries like Polars or Hugging Face, which are actually optimized to be able to run in those types of environments? For me personally, my message, when we take a look at are there potential optimizations or bottlenecks in the environment itself, we just try to be able to help the developers understand that selection of languages, selection of libraries is a constant balancing act to be able to get the best trade-off between performance, scalability, and cost efficiency. [0:29:49] SF: And there's also, I guess, a trade-off external to the cloud cost that you might have to factor into because you can make the argument, perhaps, that your amount of engineering time while developing in Python is going to be less than the engineering time if you were developing in a lower level language C or something like that. But it's going to be hard for you to match the sheer performance that you can get from a C program and in a Python program. I mean, there's a reason why operating system kernels are written in C. [0:30:19] BS: Yes. [0:30:20] JG: That's right. And for us, it's just providing that context to help folks make the right decision locally. Because there are certain situations where, for example, Brent did an engagement where we had a very large platform that was very data intensive that was using the wrong library. And just shifting them from one library to another, which was a relatively low lift, increased the performance by 100%. And it was a Lambda infrastructure. The faster it went, the lower the cost was. They doubled their speed, reduced their cost by half. And it was a win-win. And in that case, it made a lot of sense. But to your point, our goal is not to walk around and get everyone to rearchitect, rewrite their applications. But be a mindful decision. If you're building an application that is very performance sensitive and you're in that development lifecycle of building the new microservice or a new capability, you might benefit from that language, not just for costly reasons, but really for meeting your performance requirements for your application. Yes, this is where we find the spreading insights as much on the financial efficiency as we try to spread insights on performance. We see those going hand-in-hand. And actually, the added benefit of talking about it both from the performance and the cost perspective is it helps build that trust. If we're helping you to create a better tool to meet your requirements, we can also make you efficient. It really kind of helps kind of generate that win. [0:31:44] SF: Mm-hmm, yeah. What about in terms of like GPU optimizations, which I think is probably something that a lot of businesses over the last couple of years are thinking about? How does that perhaps differ from traditional compute optimization? [0:31:57] BS: The biggest change that we start to be able to see when taking a look at GPU optimization versus CPU optimization is just understanding that the architectures have fundamentally changed when you look at a GPU-based environment. In a CPU-based environment, the CPUs, the workhorse, all of the operations are executed sequentially through the CPU. Just how fast can you push sequential operations? When we switch over to a GPU-based environment, the CPU now plays a completely different role. So you can't take a look at how busy the CPU is. The CPU's only function in this environment is how quickly can it dispatch instructions to a GPU to be able to execute on it. And then you got to take it even one step further. Each GPU is comprised of multiple streaming multi-processors, multiple CUDA cores. Instead of taking a look at a single a function, i.e. like a CPU, how heavily is it utilized, you now got to be able to get down into the weeds to be able to take that next step to say, "Hey, how much are we engaging our GPUs? How thoroughly are we starting to be able to saturate our streaming multi-processors, CUDA cores, et cetera?" Are there opportunities for us to be able to go through and actually tune the way that we're presenting the work to the GPU instances to be able to take better advantage of its capabilities? [0:33:28] SF: I understand capacity planning for GPUs around training cycles where you have some sense for how much data you're going to have to process, how long those things are gonna take. But how do you think about optimization for capacity planning around inference cycles? [0:33:43] BS: A lot of it for us, specific to inference, comes down to testing the capabilities of the GPU within the context of how we're going to be able to use it. How closely can we actually replicate or simulate the traffic patterns that are going to be presented as inference to the GPU to figure out what is the appropriate instance type size architecture to be able to handle the workloads? Above and beyond that, just out of even the technical range, it's going back old-school algorithms to be able to determine capacity planning. How many instances theoretically would we need to be able to grow into given the different dimensionality of the users we may start to be able to see? [0:34:26] JG: To build on that, right now in the cycle of where the industry is, we are a little bit operating in on-premise kind of mindset from a capacity-managed effective just because of scarce resources and availability of the instances. Right now, we are far more comfortable in over-provisioning just to time-to-market and making sure that we're meeting our mark operationally. What we're trying to leverage this time period, and availability to GPUs will only improve with time, we really want to land on how do we measure that efficiency? How do we get to be as good in defining what efficient on a GPU instance is as we have the insights on a traditional CPU in order to then be able to have some of the conversations that you're kind of referring to, Sean, is getting influencing teams to say, "Look, you're not quite where you need to be. And here's what we recommend." We certainly have our opinions and we're only making those opinions stronger as we learn through this process as well to get to a world where we are going to be scaling more in and out, more real-time with GPUs as we are in CPUs. Hopefully, in the near future. [0:35:34] SF: Yeah. I mean, I think that's the challenge right now for everyone is we're all kind of still in the learning cycle of what this means. We don't have decades and decades that we have with conventional CPUs and memory structures. Outside of GPU optimization for AI/ML workloads, are there other optimizations that you have to think about that are maybe different than traditional workloads? [0:35:56] BS: For us, this is a kind of a multifaceted answer. First one's getting to figure out where are our opportunities in a conventional CPU world? We used to be able to take a look at CPU utilization to be able to inform how well we use in the resource. In a GPU context, utilization itself is not necessarily sufficient. Because utilization more points to are we using any of the resources on the GPU rather than how thoroughly. We start to be able to shift our narrative over to how heavily are we actually starting to be able to saturate the GPU? In other words, are we taking advantage of all of its parallel processing capabilities? We do that today largely through taking a look at the wattage and the thermals on the GPU itself. In other words, understanding what the upper thresholds are for wattage on a GPU, upper thresholds for thermals. What period of time are we at? What percentage of those maximum capabilities? I can now take a look in kind of triangulating. If I start to be able to see, for example, GPUs with high utilization, but low saturation, meaning basically my utilization is 80, 90, 100 percent. I am using some resource on it, but I don't ever see the wattage increase, that now tells me that I've got to go back and work back with some of our teams that are presenting models to it to be able to tune the model so we're better able to take advantage of the resources on that GPU. Instead of, once again, like a single unifying metric, now we got to be able to start to triangulate metrics and figure out what problem exactly are we solving with respect to optimization. [0:37:37] SF: Right. [0:37:38] JG: And then maybe kind of almost taking quite a bit of a jump into a different type of optimization is, at our scale, we really believe in platforms and creating enterprise platforms that are multi-tenant and different users, either developers or analysts or business folks are able to use those tools. As those users log into these platforms without them even knowing, they might be provisioning highly oversized capacity for what their needs are. There is a question of product design. And how do we influence our enterprise platforms for what the interaction is between the user of the platform and that internal managed service? Where we provide transparency into the decision-making or maybe you automate away some of the decision-making to make sure that that user gets the experience that they need for whatever the tool provides, but also that platform team is accountable for making sure that that user is being super-efficient. And it's a little bit of that shared responsibility model where the platform needs to think not only about the architecture of the platform, they also need to think about the user interaction because that user might not be enticed or might not have enough insights to which dropdowns to choose for the thing they need to use. How do we make sure that platform team builds an experience, but how do we make sure that user has the right information to make the right choices as well? There's a lot of aspects of product design that we also spend a good amount of time on debating for how to best go after that problem statement. [0:39:02] SF: In terms of other types of optimization, GPU or otherwise, how much does, I guess, sustainability and sort of the overall carbon footprint, energy footprint that you're putting into these resourcing play a factor? [0:39:17] JG: Really, going into 2025, we have elevated sustainability to be one of our top KPIs that we measure. Our FinOps journey has now been five, six-plus years. And then we have found, "Look, we've optimized some of the big things and the obvious thing about low-hanging fruits." We have tooling and messaging around all the different things developers can do to save financials. We're encountering a phase of, A, how do we get more folks to really care? How do we get more folks to really take the actions that we're preaching? How do we drive that culture of efficient engineering as good engineering? And for some folks, money doesn't resonate if it's not from their own back pocket. Sustainability can really play a big role where we hear from a lot of our developers that they are very motivated by taking the right actions to have a meaningful positive impact on the environment externally. There's a big push that we're adding into our tooling to provide context and information for what kind of impact you have environmentally or how much wattage you're using on different solutions. And this became only clear, as this year we've been looking at different metrics and we saw that our cost trends went in a certain velocity, but our wattage trend went in a drastically exponential velocity. Of course, as GPUs are quite power-hungry, and that becomes a bigger and bigger footprint. We both want to tackle it from getting our associates really energized and engaged on the things we've been trying to drive for some time. But also, we also want to be mindful of the power draw that is required to support everything that everyone wants to do with GPUs and AI. And we just want to be responsible towards how do we leverage the technology to drive the changes we are looking to drive from a customer experience perspective, but in a sustainable way. [0:41:01] SF: I think over the last few years, and maybe this is just a product of people sort of been invested in the cloud now for a lot of people for over 10 years, there's been some pushback on the idea of the cloud. There are a lot more people, I think, upset about the cost. And there's also been even movements of like people de-clouding and moving off the cloud and going back to on-prem systems. In terms of this cloud cost optimization, what is your vision of the future for that? [0:41:28] JG: For me, the promise of the cloud is real and true. But I feel that in all of that goodness, the reality sometimes is a little bit forgotten. That's where FinOps really grew so potentially is, "Look, the cloud allows you to do anything you want. Anything you want can be very, very expensive." All of a sudden, you're now tasking engineers to be thinking about costs as part of engineering. And I think that's a good thing. I think we just all need to grow that muscle to consider that as a variable in the equation of what a good architecture looks like. And if you're thinking, as you design an architect and then solution and operates the application that you have built, the cloud can be a beautiful place and meet all those things that have been advertised as time-to-market and great costs. But that doesn't come without a little bit of effort and mindfulness. And I think as larger and larger organizations lean into the cloud, that problem becomes a very large dollar amount. And when you have a startup mentality and you're trying to only have so many funds in order to or something in a very short time period, the cloud is amazing. And those engineers are very incentivized to be frugal and mindful of those engineering decisions. And that drives to amazing outcomes. We got to figure out how to drive that culture of a startup mentality into a large enterprise, so you can also get that time-to-market benefits, you have to definitely include that frugalness and mindfulness and good engineering from an efficiency standpoint. And that's really where I think FinOps is looking to do. Our goal is to drive that culture. I do still feel the cloud is the place to enable all the goodness, but it definitely has to be supported through FinOps and other practices that are not quite needed in on-prem. [0:43:10] BS: I love that and I agree 100%. Like Jerzy, I don't ever see a wholesale shift back to on-premises data center. I do think that we are going to start to be able to see an evolution towards more of a polycloud environment where users start to be able to run workloads in the environments that make most sense to be able to run them in. And I think what we're starting to be able to see is some of the different foundations, like the FinOps Foundation, are leaning into a future that looks that way with their focus project that allows different companies to be able to run workloads in different clouds, but keep a same look from a billing perspective. Kind of a ubiquitous look as far as cost and usage goes just with the future of hopefully be able to run workloads where it's most cost and performance advantageous for what they're driving towards. [0:44:01] SF: Yeah, and I think a lot of the cloud providers have become - obviously, they're like aware of these issues and the pushback as well. And they've done more over the last couple of years to make investments within the product to give you more visibility into what spend looks like, how you can optimize, and so forth. It seems like a trend that is going to continue to make both the sort of hyperscale or cloud companies are going to make these investments. And then companies, like Capital One and beyond, are also going to continue to be sort of more conscious of this and how they can make these optimizations. [0:44:30] JG: For some of these cloud providers, some of the products and features they roll out, it's a learning opportunity too. We've had several occasions where, working with AWS, we would provide feedback on certain services that they offered that just at our scale just financially made zero sense. We really wanted to use the services and we worked with AWS to kind of explain how we got to our thinking and our strategy. And props to Amazon where they took that feedback and they worked through it. And there are cases where different services got completely re-architected to be able to scale to larger customers. We take our relationship with our cloud providers very seriously where we feel that it's a two-way street. We can provide feedback and requests what kind of features we would like to see. And the cloud providers are there to listen and then, in some cases, completely services to make them more economical where they would have not made sense for a large corporation after they made the change is now something that makes sense to actually leverage in the cloud versus to shift to another provider or build something in-house. [0:45:32] SF: Awesome. Well, Brent, Jerzy, thanks so much for being here. This was great. [0:45:36] JG: Awesome. Thanks for having us. It has been fun. [0:45:38] BS: Thank you. [0:45:39] SF: Cheers. [END]