EPISODE 1791

[INTRODUCTION]

[0:00:00] ANNOUNCER: Serverless computing is a cloud native model where developers build and run applications without managing server infrastructure. It has largely become the standard approach to achieve scalability, often with reduced operational overhead. However, in banking and financial services, adopting a serverless model can present unique challenges. Brian McNamara is a distinguished engineer at Capital One, where he works in serverless integration and development. Brian joins the show with Sean Falconer to talk about why Capital One shifted to a serverless approach, how to think about cloud costs, establishing governance controls, tools to stay well-managed, and much more. 

This episode is hosted by Sean Falconer. Check the show notes for more information on Sean's work and where to find him. 

[INTERVIEW]

[0:00:57] SF: Brian, welcome to the show. 

[0:00:59] BM: Hey, Sean. Thanks so much for having me. 

[0:01:00] SF: Yeah. So, to start our conversation, can you explain a little bit about your role at Capital One? 

[0:01:04] BM: Sure. I'm a distinguished engineer at Capital One. Essentially, that's a senior IC role. And my role really revolves around considering how we do serverless at scale, both at a tactical level and a strategic level. At a tactical level, I have the opportunity to engage with different teams who are looking to adopt serverless compute, help them run through any outstanding questions that they might have around the technologies, help them decide which compute may be better for their given workloads. But then we also have the opportunity on my team to actually work with others across the enterprise. 

In terms of where I sit at Capital One, I'm in the retail bank, but I have the opportunity to really work with partners in other lines of business like our card business, enterprise, cyber, ML. It's really nice to operate really both at a very high level and really a very low level too. 

[0:02:02] SF: When you reach sort of the level of distinguished engineer within an organization, is that your primary thing that you're doing day-to-day? Instead of necessarily hand on keyboard coding, you're a little bit more like a thought leader within the company and working with some of these other teams to help them invest in certain technologies where you have deep expertise in. 

[0:02:24] BM: I'll say yes and. One of the nice things about the distinction engineer role at Capital One specifically is you do have the ability to influence really how a larger engineering organization works or considers problems. In that sense, we do get to work on the big boulders. In my role, though, it's actually a really good blend in that I do get to get my hands on the keyboard. 

While I may not support feature teams and actually write production code, I am working on ways, whether it's in sample applications or templates, looking for ways to help streamline the overall adoption of serverless at Capital One. 

[0:03:01] SF: Okay. What would you say is kind of like the primary difference between operating at this level as an engineer versus, say, being at maybe like a senior or staff engineer level? 

[0:03:11] BM: Yeah. I think - I mean, obviously, this is my experience. It's the ability to influence strategy. I think is one of the main differentiators. While I do get to work with those future teams who are looking to adopt serverless compute, it's working with our partners across the enterprise, not just within a single line of business. And that really, I think, allows us to take a more holistic view in how our engineers approach solving business problems with serverless technology. 

[0:03:45] SF: I see. And why did Capital One decide to make this transition to serverless technology? 

[0:03:52] BM: Great question. If you look back at Capital One's history, we really started our cloud journey in 2014. And if we look at how we've adopted cloud technologies over the years, I think it followed more or less, I mean, I hate to say a traditional pattern, but you can look at initial efforts revolved around literally lift and shift. Moving existing stacks from on-prem to the cloud. 

But really, as the years progressed, our leadership was looking for ways to improve the overall developer experience and, quite honestly, reduce developer burden. Letting developers focus on what they do well. Really, in 2021, we made a declaration that we were going to be a serverless-first company. And what that's meant is for newer projects, newer applications, not that you have to go serverless -first, but serverless technologies like Lambda and like AWS Fargate should be considered first. 

And if we look at the reasons why, one of the canonical examples that people provide for why any organization goes serverless is lower cost. I think it makes sense to unpack that a little bit more. If we look at what's driving that cost, there are really three components. There's the engineering cost, there's the cloud infrastructure cost, and then there's the maintenance cost. 

If we look at the cloud compute cost, you may actually find that serverless, when compared with an equivalent or an analog for an instance-based compute, may actually be more expensive. People often hold up cloud cost as a reason to not consider serverless. I think our leadership has looked beyond that and said, "Well, really, if we look at what's really driving overall cost or the total cost of ownership, it's not necessarily only cloud cost. We have to consider maintenance cost and the ability to innovate." And I think, really, that's what serverless provides. 

If we look at the dimensions of what constitutes serverless compute, really, it's all about minimizing management costs. Yes, there are servers and serverless, but you don't have to manage them. You don't have to patch them. You can have compute that scales to really, really high levels if needed, but can also scale down to zero if needed to. You also have compute that really runs when it's needed to. So, in response to business events. And you have built-in resiliency. 

Looking at, let's say, a service like AWS Lambda in particular, you get high availability out of the box. Whereas if you look at instance-based compute, you may need to consider, like, "Okay, how resilient I want this application to be to different types of failures?" Well, with a managed service like AWS Lambda, well, we as customers don't have to worry about that. We can let developers focus on the interesting problems. 

[0:06:41] SF: Yeah. I mean, I think the point you raised there about sort of speed of innovation and being able to sort of potentially plug and play different parts of the stack as essentially the public cloud introduces new technologies. It's a little bit easier to be adaptive. Versus if you are sort of running this stuff yourself, then, in the same way that like a monolithic application, you end up with like this tight coupling of different services and it makes it hard to be essentially as adapted. Because every team kind of needs to be in sync and it's going to really slow down your products. 

If you're running this yourself, you're sort of almost like you're tightly coupling your software to the sort of on-prem system that you're running versus having a little bit more of a decoupling between the service that you're creating and essentially taking advantage of something like public cloud. Does that make sense? 

[0:07:30] BM: Yeah, exactly. We're working with wonderful cloud partners like Amazon. This is a great time of year where we get to see what Amazon's been working on for the past year. It's always exciting knowing that we have a partner that's innovating on our behalf. 

[0:07:45] SF: Yeah. And then some of the problems I've seen companies run into when it comes to actually like cloud costs being expensive comes from not really going through the process of like rethinking or rearchitecting the way that they are building and running their software to be best suited for the cloud, right? Legitimately just sort of lifting and shifting something into the cloud is not going to help you save money. You need to go through a process of sort of rethinking how those services are going to take advantage of the fact that not everything has to run in the cloud 100 % of the time. And that's how you can actually help reduce your cost if you're optimizing for things that the cloud is actually good at. 

[0:08:24] BM: Yeah, I would agree with that. And with like, let's say, Lambda applications in particular, even when we see teams that are interested in moving from other compute to Lambda, there's this initial thought to build essentially like a monolithic Lambda or Lambda lift. And without even - if they don't initially go through the process of decomposing their application, breaking down that monolith, they may find that Lambda may not be the right choice for any number of reasons. But I think for teams that take the time to consider what their applications do, how they're invoked, and really break things apart, they find that Lambda can really be a suitable landing place for them. 

[0:09:04] SF: I believe you said that this sort of project kickstarted in 2021. What is the status of things today in terms of Capital One's journey to serverless? 

[0:09:14] BM: Yeah. We have a large number of applications that are in fact running on a serverless compute. I can't share the exact numbers, but it's a non-trivial number. And what we've seen for the teams that have made the transition is that they are able to spend time or more time on adding business value on delighting customers and much less time on patching infrastructure or managing availability in the event. As a part of our normal process of launching applications, we ensure that applications are resilient to different types of failures. Well, with serverless compute, that honestly becomes a lot easier. 

[0:09:51] SF: Mm-hmm. We talked about a couple of the advantages of serverless in terms of the team being able to work faster, be a little bit more adaptive to changes in technology. Is there also an advantage in terms of bringing engineers into the organization where they might have experience with a lot of these services already and it's kind of like the way that they're used to working? 

[0:10:12] BM: Yeah. I mean, I would say for us as a large enterprise, we have about 14,000 engineers. When we do bring new engineers on, it's often easy - yes, as with any new hire, there is a ramp-up period. But we find that people are happy to embrace serverless technologies without necessarily being beholden to legacy architectures. Figuring out what the right server is to jump to and all those activities that don't necessarily add value. We find that the more that we lean into managed services, the more productive our developers can developers can be. 

[0:10:51] SF: And you don't have to share specifics, but can you help maybe for the audience shape the amount of data or traffic that you're dealing with and actually running through AWS? Like how big is this essentially? 

[0:11:05] BM: We are a very, very large AWS customer. Without going into specifics, I can say you can think of Cap One as a very, very large user of AWS. Strategic partner with AWS akin to a Netflix. We are all in on the AWS public cloud. In fact, we shut our last data center back in 2020. Really, whether it's serverless compute, instance-based compute, storage, we are very, very large users of AWS. 

[0:11:35] SF: Mm-hmm. And what are some of the services that you use within AWS? What were some of the like original workloads that you wanted to move over? 

[0:11:41] BM: We are large users of serverless compute, as you might imagine. But we do also have a large footprint in more traditional compute. Whether it's instance-based like EC2 or ECS on EC2. If we look at other services that are more serverless in nature, S3. We are very, very large users of S3. That powers a lot of our AI and data analysis workloads. Then there are the connective services like Amazon Simple Node Notification Service, Amazon Simple Queue Service. Really, we don't use all AWS services, but we do use a lot. And what we do use, we're pretty heavy users of. 

As you might imagine despite being, yes, we are a very, very innovative tech company, but we are a financial services company first. So we do have to be mindful in how services are approved and in how we govern usage within the developer community. 

[0:12:42] SF: Yeah. I'd love to get into details on that. But one thing before we jump there is, like with many industries, banking's gone through a lot of transformations I think over the last certainly 10 to 15 years. And I think one area that has changed banking significantly is around the fact that now, if I'm a customer of Capital One, a lot of my access to my accounts is going to be coming from like a mobile device. Whereas before, maybe I actually had to go into a bank or reach an ATM machine or something like that to make something happen. And I would think that that would have a pretty massive effect on sort of the traffic that you end up seeing. Because I can essentially just be hitting that banking app multiple times a day, doing transfers, doing refreshes, like all kinds of stuff that you never would have seen before. 

[0:13:27] BM: Yeah. For us, we have recognized the industry changing from being exclusively to a physical experience to the virtual experience as well. And I think Capital One places a really high value on delighting customers wherever they are. Whether it's in a retail bank itself, whether you're a customer of our retail bank who uses our mobile app or our website, or, similarly, a credit card holder, we want to meet you where you are. 

[0:13:57] SF: Yeah. And I guess some of the elasticity of the cloud really helps with potentially spiky traffic. 

[0:14:05] BM: Yes. Yeah. And you bring up a great point there. I think when people were initially migrating to the cloud, when you saw the public cloud become more of a thing in the mid, we'll say, 2010s, where you saw larger businesses adopting cloud technologies, I think the initial value that people held up was that you'll save money. And I don't know if that's necessarily true. I think what you're really buying is elasticity. The ability to be wrong. 

If you about what it takes, the process of actually racking and stacking a physical server, there's a whole lot that goes into that. And you better be right. Otherwise, you're living with that decision for a non-trivial amount of time. I think what the cloud offers you is, one, the ability to be wrong. If you're looking at instance-based compute, you don't have to get it just right right out of the gate. It's just an API call to destroy that instance. It's an API call to create a new a new instance. 

But I think the point that you bring up about adjusting to customer demand and having that elasticity built in, even if you get that instance sizing right or the Lambda function configured with just the right amount of RAM, well, you may need to scale horizontally. And I think that's really the power of the cloud. We don't have to have capacity just waiting. And with each type of compute that is offered by cloud providers like Amazon, it's interesting to look at what the unit of scale becomes and how fast that unit of scale can be applied. 

With services like Lambda, you can see the unit of concurrency, which is really the unit of scale. You get a large number in your account and in your region out of the box. But depending on your needs, you can work with Amazon to increase those limits as needed. And we've certainly seen use cases where we have had to go well beyond those account default limits. 

But beyond that, you're able to get more and more capacity and scale to really high levels pretty fast too. Whereas like even if you compare Lambda scaling with, let's say, instance-based auto scaling, you're talking about potentially milliseconds or hundreds of milliseconds versus minutes. It lets you more closely associate scale with the business value. You don't have to over-provision or worry that you're under-provisioned. 

[0:16:29] SF: Right. 

[0:16:30] BM: What you need as you need it. 

[0:16:32] SF: Does working in sort of these elastic serverless environments change the way that you have to think about monitoring and observability? 

[0:16:40] BM: I'll say yes and no. So, yes, it doesn't change the need, right? There is a need to understand what's happening in your application. That need doesn't go away. Just because AWS is handling, we'll say, the underlying infrastructure, it doesn't absolve you of making sure your applications are monitored appropriately. It does change how you do it or the mechanism that you use. 

Like in many regards, I think working in serverless environments forces you to be more disciplined in how you approach observability. There's no instance SSH N2 or where you can run htop and see which processes are consuming CPU. or running I/O stat or VM stat, you don't have an instance, right? I think you do need to be more disciplined. 

The good news is AWS does provide a number of metrics out of the box, whether it's to deal with concurrency, performance. There are a number of metrics that are there. But you also have the ability to add your own instrumentation as well. If you want to write custom metrics that are associated with, let's say, business value, user signup, user abandonment, or deposits made, you have that ability, right? You can build that logic in itself. 

And the nice thing is there's a really good vibrant community that has, I think, shown how to do this well. Whether it's using utilities provided by AWS. I'll give a shout out to a capability that AWS offers called AWS Lambda Power Tuning. Initially, power tuning was built to help with observability. If you look at what the module provides, there's power tuning for Python, Node, and for Java, the nice thing is logging metrics and traces are all first-class citizens in those modules. You don't have to do a whole lot to get a lot of visibility, which is really nice. 

But if you wanted to embrace industry standards like open telemetry, you can do that too. And there is a good story for OTel with Lambda in particular. If you want to instrument code, you can. Otherwise, you can lean into the SDKs to auto-instrument code. Obviously, there are trade -offs there. If you are using a non-compiled language, auto-instrumentation, you'll see a penalty at cold start, but you can do it. You don't have to be an OTel expert to get that visibility, which is great. 

[0:19:12] SF: I wanted to talk a little bit about essentially governance and security compliance. 

[0:19:16] BM: Yeah, fun stuff. 

[0:19:18] SF: Banking is a very regulated, sensitive industry. You have to take care with anything that you're building. Obviously, you're sort of dealing in customer trust. How does that kind of change the way that you have to think about building products when you work at a bank? 

[0:19:33] BM: Yeah. I mean, at Capital One, security is job one. As you pointed out, we are working in a regulated industry. We need to make sure that we're doing things the right way, from a security standpoint, from a governance standpoint. What that means is that, as new services are introduced, we may not be early adopters, right? We need to do evaluations to determine whether or not services have the necessary controls that we feel they need. There's that part of it. Making sure services are imbued with the necessary controls that we need internally. 

But beyond that, we need to make sure then that our process of building and deploying code also is very rigorous and stands up to compliance. Making sure that all artifacts are versioned. Making sure some of the practices that we adhere to. We use tools like Open Policy Agent or OPA to make sure that anything that we deploy conforms with our policies. That we're not deploying services that we're not supposed to deploy. Making sure that, even for the resources that we can deploy, that they are compliant. That we're not using certain properties. Or if we are using certain properties, that they're configured in a particular way. You can get really, really granular. And the nice thing is you can offload burden from your developers in figuring it out, right? We use OPA so that developers can deploy compliant applications. It's an important part of how we deploy code. 

[0:21:09] SF: Mm-hmm. And then in terms of the same care that you have to take when potentially adopting new services within AWS, what is that process like when you look at potentially bringing in a new library within the actual source code that isn't something that was developed at Capital One? Obviously, I'm sure you must have some checks in place to make sure that there's not some sort of malicious supply chain attack that is hidden within the library or a reference library or something like that. 

[0:21:38] BM: As a part of our security process, we do vet new libraries that come in. And we do have internal processes that continually check for vulnerabilities and notify teams when they're using modules that are no longer compliant because they may have a critical CVE. Yeah, we really try to secure the entire supply chain from build to deploy to your running applications as well. 

[0:22:03] SF: Mm -hmm. These governance controls that meet these standards without sort of compromising some of the speed and efficiency that you get from this serverless development environment. 

[0:22:14] BM: Yeah. I mean, it is certainly an interesting question. There is this natural tension, I think, between developers who want to iterate fast and want to deliver and want to execute on the latest, greatest thing. But I think many of our engineers also recognize the importance of the work that we do and they're willing to accept a certain trade-off there. Because we're dealing with people's money and their finances, we need to partner closely in engineering organizations or more developer-focused organizations, we have to partner with our developer experience teams. Teams that are actually managing the CI and CD processes. We have to partner with our cyber teams. We have to partner with our open-source management teams. There is a lot of coordination. 

And I think if we consider all that's involved there, really, there is this effort to try to shift as much of that as possible away from our developers. In many respects, we try to both shift left and shift right. When we talk about centralized controls, we want to make sure that our CI/CD pipeline is the choke point, the ultimate decider to determine whether or not something can be deployed. Is it compliant? Is it secure? But we also want to minimize friction for developers. Their day-to-day should not be spent wondering, "Am I doing this right? Am I doing this securely?" 

And we do things internally. One of the nice things we've built, we certainly lean into open source tooling, like I mentioned, Open Policy Agent. We have the ability to let our developers determine, "Am I doing things in a secure and compliant manner? Or do I have to wait for something to go to a pipeline and see it fail?" Ideally, we want to shift that left. We use some internal tooling. 

But we also do lean to tooling like AWS cfn-lint. You can determine for, let's say, cloud formation-oriented deployments, the templates that you're deploying, are they syntactically correct? Is it valid YAML? Is it valid, JSON? All the way down to like are there rules or resources that you have to find? Are the properties that you're specifying valid? You can also lean in right, write your own rules to determine whether or not that template is compliant and add that to your cfn-lint run. It's a really delicate balancing act. Making sure that we do provide our developers with the means to be agile, iterate quickly, while balancing the need to be secure, compliant, and well-governed. 

[0:24:55] SF: What role does strong notification messaging system play in this? 

[0:25:00] BM: For messaging systems, you think it's important that developers understand like when things change. Making sure that they have visibility into what has changed and why it's changed. Ideally, we would you know allow those developers to see notifications when we shift things right when we have that central CI/CD process. If things you know aren't going according to plan, if let's say builds fails, if deploys fail, making sure our developers understand where the failures occur. And similarly, actually having that visibility tied back to shift left efforts as well. If resource that you're trying to deploy is not compliant, making sure they understand why and making sure that we have supporting documentation to help them understand how to get that non-compliant resource compliant. 

[0:25:50] SF: Mm-hmm. Do those notifications ever get too noisy? 

[0:25:54] BM: I'm not going to lie. It can be noisy for our developers. But the great thing is we have a really, really strong developer productivity group internally and they're constantly looking for ways to improve that experience. Because it's one thing to see that a build failed, and you see this huge dumping, you wonder, "How the heck am I going to troubleshoot this?" They've and spending a lot of time in narrowing down where failures occur, making sure that messages are only as verbose as they need to be and providing that supporting documentation. And, also, meeting your developers where they are. Whether you're looking at a build log or whether you receive Slack notification or an email notification, just making sure that developers understand, at that necessary point, why things didn't necessarily go according to plan. 

[0:26:44] SF: What role does the serverless center of excellence play in all those? 

[0:26:48] BM: Yeah. For an organization the size of Capital One, I think it'd be arrogant to say, "I know what our developer community needs." And I'm the only one who can speak authoritatively because I am not. I am not at all." Different lines of business have different priorities, and they have different needs. Really, what the center of excellence allows us to do is group people, like people who have an interest improving the serverless developer experience. Letting them come together and share what's working, what's not, what are the pains, what can be done? How do we appropriately leverage the knowledge and experience that we may have as a group of domain experts to improve the lives of all developers, whether or not you have that domain expertise? 

And, also, really, I think, work with other teams outside of, let's say, the serverless center of excellence. Work with cyber partners. Work with enterprise partners. Work across multiple lines of business. Really, it's a good platform to receive feedback from the developing community, but also influence how other teams consider the work that we do as developers. 

[0:28:01] SF: Mm-hmm. And when you talk about the developer community, you're talking about the internal developers at Capital One. Is that right? 

[0:28:08] BM: Yeah.

[0:28:09] SF: How big is that roughly? 

[0:28:10] BM: There are about 14,000 developers at Capital One. 

[0:28:14] SF: Okay. A good size. A good size community. 

[0:28:16] BM: Yeah. 

[0:28:17] SF: Going back to some of the things that we were talking about at the beginning there in terms of some of the reasons for moving to serverless for Capital One, what were some of the challenges with actually putting that migration in place? 

[0:28:30] BM: Yeah. I think there are a few. One, I would say is plain old FUD. Fear, uncertainty, and doubt. I think people for a long time have associated Lambda with, I hate to say, toy applications, but I will say operations-oriented things that may not be business critical. Lambda doesn't scale like I needed to scale. There's no way I can run my application in Lambda. Initially, five minutes isn't enough. 15 minutes isn't enough. I need more resources. I think developers will be surprised at the workloads that can be handled by serverless compute like Lambda. 

The other important thing to note too is that we're really big on saying we're serverless-first, but we're not serverless-only. There are going to be certain workloads that are not suited for Lambda that are not - if they're not well-suited for Lambda, we would ask teams to look a serverless container services like Fargate. But even beyond that, if a service is not right, we have the means to support other compute and other services as well. 

But I think the biggest issue is helping overcome a lot of FUD. Even helping teams understand what good looks like. What I mean by that is, earlier, you mentioned how does observability change? Well, we want to make sure that teams are empowered to know how their serverless applications are running. When it comes to things like splitting apart the monolithic application, what's the right way to do it? When is the right time to do it? How does your Lambda function scale? How does your application traffic change over time? Are serverless services like Lambda able to keep up with what you need? Really, overcoming FUD. But then, ultimately, just doing the hard analysis to say like is Lambda right for you? If it is, great. If it's not, that's okay too. 

For those teams that did decide to make the plunge, I think it was initially a struggle to help them understand what the different metrics really meant. For non-serverless compute, a really common metric, let's say for APIs in particular, is requests per second or transactions per second. And in Lambda, if you look at the unit of scale, it's concurrency, which is really a product of the number of requests that come in. That RPS. That TPS. But, also, duration. How long does that Lambda function run for? And the goal is to minimize the amount of concurrency that your function is consuming. 

With that, it's a matter of helping teams understand what's the right way to impact performance? And in Lambda, really, there's only one knob to turn, and that's memory. With Lambda functions, you'll find that both memory and CPU scale together linearly. A 256-meg Lambda function has twice the compute and RAM as a 128-meg Lambda function. For some teams, we would see them come in and allocate 10-gigs of RAM. Like, "I need all the RAM in the world." When you look at what they're actually consuming, it's like dropping the bucket, maybe we ratchet that down. 

But we also saw a lot of teams under-allocate the amount of RAM. Knowing what the calculation is for Lambda functions, you have invocations. There's a number of invocations, that's a component of the price, but then there's the amount of RAM consumed and the duration that that RAM is consumed for. And we would see teams allocate 128-megs. That's the minimum RAM that we can allocate. I'll do that and I'll save all this money. Well, what we would see is teams would actually starve their Lambda functions. Functions would actually run longer because they didn't have the necessary resources. 

The great thing is there are open-source tools. AWS Lambda Power Tuner is a great open-source tool. If you're not using it and you're a Lambda shop, please use it. Alex Casalboni was a solutions architect at AWS at the time, wrote it. It is awesome. And it really helps you determine what's that right number based on either cost or performance needs? You actually can dial that in a lot more. You're not wasting resources or money. 

[0:32:37] SF: When it comes down to teams having to make decisions about is serverless the right thing to use versus using something like Fargate? What's the framework for making that decision? 

[0:32:49] BM: Yeah. I would say, first, consider the AWS constraints. Do you have a workload that needs to run consistently for more than 15 minutes? If so, Lambda's not right for you. Do you have the need to consume more than 10 gigs of RAM? If so, Lambda's not right for you. There are some easy ones to look at. But it's interesting when you consider then, too, beyond those obvious constraints, looking at things like, well, what is your Lambda function being triggered by? 

And what I mean there is like Lambda is event-driven compute. It'll be your code is triggered in response to an event. And AWS has over 100 event sources that can trigger Lambda functions. But let's say you're using - you want to write a Lambda-backed API and you're using, let's say, ALB, Amazon's Application Load Balancer service. Well, that's a synchronous indication of a Lambda function. 

Now, Lambda can handle a six-meg payload size at this time. But ALB can only pass in one-meg. One meg in. One meg out. If you're writing an API and you need to handle more than one meg in or one meg out, Lambda may not be the right choice if you're using ALB. With a service like Amazon's API Gateway REST service, that number jumps up to the full six-meg. API gateway can handle 10-meg, but you're going to be constrained by the six-meg Lambda limit. 

Look at the obvious things. Like, "Hey, what are the physical constraints?" Consider what you're looking to integrate with. Beyond that, though, it really gets interesting. One common thing that comes up that would drive someone away is, "I have a Java application. Java will never run well in Lambda." Well, I would ask you to revisit your assumptions. There are ways to mitigate things like cold start pains. Consider how often things like those cold starts happen. Is it something that's happening a lot or not? If it's a synchronous invocation where you have a user on the other end of that request, that cold start may really matter a lot more than an asynchronous invocation. Someone uploads an object to S3. Well, you may not have somebody waiting on the other end of that request. If you have a cold start that takes a little while, so what? It may not matter as much. 

The other thing too is, I mean, the AWS Lambda team has worked really, really hard over the years to improve that developer experience for, let's say, Java in particular. You can use services or capabilities like provision and concurrency, where you essentially have an AWS management capability that will keep a certain number of Lambda execution environments warm, so you won't have those cold starts for the number of provision concurrent units that you've set. 

Last year, Amazon also introduced a capability called SnapStart for Java. And I'll be honest, I need to revisit the exact Java version. I want to say Java 17 and higher. I need to double-check. But this year, they actually introduced that for Python as well. It was a re:Invent announcement. AWS is looking for ways to improve that developer experience and help developers force them into making a tough choice. Like, is Lambda right? 

[0:36:07] SF: Where do you think re:Invent is going on this? There's lots of tons of announcements. Where do you think serverless is going in the next couple of years?

[0:36:15] BM: Interesting places. One of the more interesting announcements I heard this week was on the SQL Aurora. That, I think, is going to be massive in ways that we can't yet comprehend. That's one. I think durable workflows will be another area. If you consider how important it is for certain workloads to run to completion, when you have ephemeral compute services like Lambda, the state becomes really important. How do we do that for a really important work? That's going to be another one. 

I think continuing to manage the software supply chain is going to be really important too and providing visibility into that. And I think the last I would say, improving the operator experience. One thing that makes me shudder is when people say serverless is no-ops. It absolutely is. It doesn't absolve you from your operational responsibilities. The good news is, as your cloud provider, whether it's AWS, Google, Azure, or anybody else, they're assuming more of the responsibility. But it doesn't mean that you don't have any responsibilities. I would say I would look for ways to improve the understanding of what's happening in applications. The ability to observe what's happening is going to become really important, even more so. 

[0:37:33] SF: Yeah. Well, I know we're coming up on time here, Brian. I want to thank you so much for coming on the show. I really enjoyed this. 

[0:37:40] BM: Yeah, Sean. Thank you so much for the invitation. I really enjoyed the conversation. 

[0:37:44] SF: Cheers. 

[END]