EPISODE 1862

[INTRODUCTION]

[0:00:00] Announcer: Werner Vogels is the Chief Technology Officer at Amazon, where he has played a pivotal role in shaping the company's technology vision for over two decades. Before joining Amazon in 2004, Werner was a research scientist at Cornell University, where he focused on distributed systems and scalability, both of which are concepts that would later influence the design of AWS. He holds a PhD in computer science and has authored numerous academic papers on the reliability and performance of large-scale systems. As CTO, Werner has been instrumental in guiding Amazon's transition from an online retailer to a global cloud infrastructure provider. He is one of the key architects behind Amazon's push into cloud computing, helping to define the new model for delivering infrastructure. He is known for his pragmatic, customer-focused approach to technology and for championing ideas such as you build it, you run it, APIs are forever, and more recently, frugal architecting, which emphasizes cost-effective and sustainable software design. 

In this episode, Kevin Ball sits down with Werner for a wide-ranging conversation. They discuss the early days of Amazon, the birth of AWS, the principles of the frugal architect, aligning cost to the business, engineering business collaboration, technical debt, and much more. Kevin Ball, or Kball, is the vice president of engineering at Mento and an independent coach for engineers and engineering leaders. He co-founded and served as CTO for two companies, founded the San Diego JavaScript Meetup, and organizes the AI in Action Discussion Group through Latent Space. Check out the show notes to follow Kball on Twitter, or LinkedIn, or visit his website, kball.llc. 

[INTERVIEW]

[0:01:56] KB: Hello. KBall here. And I have the absolute honor of getting to introduce today the Chief Technical Officer of Amazon, Werner Vogels. 

[0:02:05] WV: Thank you, Kevin. Thanks for having me, and it's going to be a fun day. 

[0:02:09] KB: I'm excited to get to talk to you. Now, I'd like to start, especially with people who have done a lot of speaking and being introduced places, how do you introduce yourself? What do you highlight when you get to say, "Here's who I am"? 

[0:02:22] WV: I've been doing this for 20 years, and I'm kind of tired now, but I have a lot of stories to tell. Now, in general, there's always things in your past. I've been an academic, and I was in the army, and worked in hospitals, and all these kinds of stories don't really matter. I mean, the last 20 years of Amazon have really formed me, especially also because Amazon, every year, is a different year. And when I was 18 or something like that, if you would have asked me if I would work for the same company for 20 years, I would have laughed at you. Yeah, but now, it's 20 years later, and I'm still here. 

[0:02:59] KB: Yep. We wanted to talk today about cloud architecture. And this is something that I think the rise of the cloud, Amazon has very much led that wave, and it has changed the way that we do software. But you've been very focused on it for the last few years. How do you introduce this topic as a field of study for software engineers? 

[0:03:17] WV: Well, maybe we should go back a little bit in time. That makes the whole story a bit more easier to follow. When Jeff started Amazon, he didn't really want to start a bookshop. Now he was just fascinated by the internet. What could you do on the internet that you couldn't do anywhere else? And he just picked a bookshop. A good bookshop has 40,000 titles in stock, yet there's millions of books out there. He could do something on the internet that you couldn't do anywhere else. 

Unfortunately, nobody has written a book about e-commerce before because the word didn't exist yet. This is '94. And so anything that Amazon tried to do after that, they basically had to invent themselves. Every piece of technology, everything you know now about e-commerce. I mean, now we could probably just use Shopify as a platform, but there was nothing. 

And so Amazon engineers have been really, really good in sort of inventing themselves out of the corners that they caught themselves in because there were many things that nobody had done before. And the cool thing is actually, and this is already before the year 2000, that AI played an important role in all of that. Think about recommendation, similarities, customers bought export, things that we now consider normal everywhere, and we don't call it the I anymore because it just works. 

Now, one of the things that definitely in my earlier days as CTO of Amazon, we had fixed capacity, and we had a really, really good mechanism to sort of predict what would happen the next year. So we brought capacity for that. Now, if I didn't buy enough, my customers would be unhappy. If I bought too much, the CFO was unhappy. You always need to be careful in that. 

And then, of course, Black Friday came. Four times the traffic from normal. A nightmare really. But what really happened five, six days before Black Friday is that suddenly your team would show up that would say, "Oh, we've got this brilliant idea and it will deliver 50 million free cash flow and we need to implement it immediately." And I go, "Look. No, we don't." But for some reason, we always made it work because these constraints also breed creativity, and you found ways to do it. 

Now, when the whole cloud thing started, suddenly everything changed, of course. Because suddenly, you didn't have to live within the constraints that the original fixed hardware had. And I thought that that would suddenly also become the point where people started to realize about how much certain architectures cost. And mostly - I mean, I love the way that we built AWS. We built amazing technology that nobody else has done before. And as CTO, of course, I'm amazingly proud of that. But maybe what I'm most proud of is that we changed the economic model. 

Before that, I was extremely frustrated as a CTO. Because, in dealing with vendors, I always felt that the vendor was in charge. I was never in charge. If I wanted to get the cost down, let's say, of this database, I needed to make a 5 to 10-year contract. I had no idea how much you would need at Amazon 5 years from now. We massively overscaled, also, to avoid sort of the penalties you would get if they would check on the number of databases that you were using. And then you would write this check. And this check had many zeros on it. And the moment you gave the check to the guy that you were dealing with, he didn't care anymore because he was paid.

And so at Amazon we had this principle that we wanted to be the earth's most customer-centric company, that you understand how that works in retail. But we started thinking how would the earth's most customer-centric IT provider look like? Well, the first thing we need to do is change our economic model. Instead of people have to pay up front, which they had to do with every other IT company, you only had to pay for what you've used. And now that seems normal, but that was revolutionary at that moment. 

[0:07:36] KB: No, it completely revolutionized. 

[0:07:39] WV: By the way, your electricity at home, you pay for what you've used. You don't go to the electricity and give them money for the rest of the year and hope it's sufficient. Yeah. In some sense, it was a normal model. And I think people caught up on that pretty quickly. Lots of other large IT companies didn't. Yeah. They were still very much addicted to the 70% margins that they had. 

The biggest thing that we were really proud of was actually changing the economic model. This pay-as-you-go model, which I thought was completely natural, also allowed you to think about the choices you make because it results in cost. Now roll forward. Well, in 2008 with the crisis, the financial crisis, we already saw some of that, that CSOs were somewhat concerned about cost of digital infrastructure. But most organizations actually that we worked with used to the COVID period also as, "Oh let's accelerate our digital transformation. Let's move everything." And so they weren't really that concerned. 

And roll forward in 2012, I gave my first keynote at re:Invent, and I gave a longer list of how I thought development would change or was changing on the cloud, and all the reasons why we were doing that, small building blocks, blah-blah-blah. The deploy to two availability zones at minimum in production. And I also said, now you can architect with cost in mind. Everybody ignored that. Why? Because moving fast and innovating was way more important. It didn't matter what cost. You could get all the things that you wanted. And you were really thinking about your idea and the things that you want to achieve, and your customers, and stuff like that. And kind of cost of all of that was sort of put under the rug. 

Until a number of years ago, I think when most the CFO or mostly the financial people started thinking like, "Should this really be costing us this much?" And that made me think that if our customers start to become more concerned about cost, maybe we should revisit that topic and give them some solid advice about how I think after, by that time, I was 15 years at Amazon, the experiences that we had and how we sort of have integrated cost in many of the architectural decisions that we've made. And so that resulted in the fugal architect. 

[0:10:14] KB: Yeah, absolutely. And I think we are very much in this phase right now where it's harder to raise money, interest rates are higher. Everybody's looking at how do we cut costs and actually pay attention to this in a way that they weren't when money was flowing freely. 

[0:10:28] WV: Absolutely. And especially think now about all the efforts in and around AI. There's enough models where $15 per million tokens. And there's models where they cost 15 cents per million tokens. Which ones give you the better result? By the way, when I say frugal architect, I don't mean cheap. I mean that you get maximum value for the money that you're spending, or what you want to do, and then work backwards from that. And this is how much is it going to cost me. Do I really want that? There's a bit of principled advice in the frugal architect but there is also some solid practical advice, "This is what you should be doing." 

[0:11:09] KB: Yeah. Well, let's start to dive into that because I think a lot of folks, they do - especially with the ease of scaling up and down with cloud, it's very easy to just say, "Oh, use what we need." And it scales and it scales. And at some point, you say, "How did we get here? This is so expensive." You start from the design side, right? How do you think about designing your system? 

[0:11:29] WV: Well, of course, there's many, many systems out there that have already been built. But definitely with the first laws, or lessons, or whatever you want to call them, I really targeted the upfront thinking where there's things that are often non-negotiable upfront. Think about compliance. Think about security. Think about accessibility. Those are things that are just given. And then there's a bunch of other things like reliability, fault tolerance, performance, things that you trade off against each other. And we will come to that later. I think cost should be part of that. You should be upfront. Be aware that whatever architectural decision you make, there is a dollar amount associated with it. 

And you know what? That's fine. Because after all, whether you were buying hardware or whether you have now operational cost in AWS, the money needs to flow anyway. But here, every little piece that you make has a financial consequence. And as such, that is at least something you should keep in your head upfront. Now, I also wanted to make the case that, especially in AWS, until we can give people clearly milligram CO2 for this particular computation that you've done, cost is a pretty good proxy for sustainability. 

The more you pay, probably the more resources you've used. And as such, you can keep these two things in balance. I mean, there's enough companies that require sustainability to be reported to the board these days. And I find, and especially at the younger group of developers, they are absolutely passionate about making sure that they don't ruin the planet while using AI, for example. And as such, having cost as just one of the other non-functional requirements, which we always have, makes you at least aware of it. 

[0:13:36] KB: Yeah, that makes sense. And in some ways makes me ask why do we even need to say that? Of course, cost is a thing that we need to think about. 

[0:13:43] WV: Many years where we didn't. 

[0:13:45] KB: It's true. 

[0:13:46] WV: No, but even in the earlier days of Amazon, where we went on our own hardware, engineers were never concerned about cost. It was there anyway. And of course, at a different level, you were concerned about cost. But as an engineer, you just asked for five more servers and, boom, there they were. 

[0:14:03] KB: Yeah, absolutely. Let's look at what that is, because maybe engineers aren't used to thinking about this. What are the important aspects to trade off in cost? I think your second principle is around how you align cost to different things within your business. What does that look like? 

[0:14:20] WV: One of the things that I find extremely important as an engineering organization is that you need to be deeply involved with the business. After all, we make technology for a business. You don't make technology in a vacuum just because you think it's funny. And also, you have to remember that AWS is a business-to-business. We make technology for other people to build applications with. 

And as such, you need to be closely reliant to your business unit to truly understand what they actually want from you. I mean, I've seen so many old situations where the IT department is somewhere else. And they get a list of requirements, go build it, come back to the business. Nonsense. And why? I mean, often requirements change, and that's normal. Or you discover new information, or the business changes trajectory, and things like that. 

I've always believed that engineers should be in the same room as the business people because then you start building applications, or systems, or support systems that actually really meet the requirements of the business instead of the business has to adopt to you. Yeah. And I think that is really making sure that you align the two. 

Also, very important is that technology costs money. And so if you have to scale and you scale up, and so your costs are growing, but your income as a company is coming from a completely different direction. Let's say you've built it as a pay-as-you-go model, but you actually have a subscription model. Those two things don't really work out well. 

Thinking up front about over which direction you think you're going to get revenue, then you need to make sure that if you grow, your costs grow over exactly the same dimension. Because otherwise, I'm pretty sure you're going to run into trouble. I, at some moment, long time ago, invested in a young business. And they were making these small MiFi devices. Now everybody has one. Or now all your phones have internet connectivity. And people would buy these devices and then buy 10 gig, or 20 gig on it, or 30 gig. And some customers get frustrated, "Why can't I just buy an unlimited package?" It can be expensive. Okay. So they made a very expensive unlimited package. What do customers do? Start watching Netflix 24/7. And as such, usage greatly now explodes above what the big price was. Having a financial model and having a technology scaling model that are not aligned is at risk for your business. 

[0:17:15] KB: Yeah, that makes a ton of sense. Now pricing is also hard. A lot of folks won't pay for pay-as-you-go in different parts of the market. How do you trade off those things or evaluate this as one of the many trade-offs that you might have? 

[0:17:31] WV: Well, tradeoffs, first of all, even before we put a cost into the mix, tradeoff was always a part of architecture. How highly available does this component need to be versus that component? And what is the kind of performance? How much capacity do we need to get to this performance? And in that sense, for example, measurement is extremely important. Knowing and truly understanding what is happening there. 

If you get an average webpage latency of 1.5 seconds, it means nothing. It means that 50% of your customers are getting a worse experience. You need to know how much works. And you need to know how to control that endpoint, let's say the 99.9 percentile. And how can you bring that in? How much does that cost me to actually do the engineering? And then do I get a return on that? The wisdom is that faster web pages give you more conversion. But there is a point of limiting return. 

Of course, after 1.1 seconds, nothing matters anymore. And so you need to think in your engineering about sort of how much is this work going to cost me? And is there, from the business perspective, a return on it? Of course, we all want to build the fastest possible webpages ever. I mean as an engineer, I would love nothing else than doing that. It's just that for the business. That's just busy work and useless because there's no return on it. 

That, definitely, I think, in the tradeoffs part, is really important. And it comes back to something else. Your application doesn't exist out of one big thing. Take amazon.com. You go to the web page of amazon.com, and there's a few things that actually always need to work. Because without that, we don't have a business. Search, browse, shopping cart, check out, and reviews. Because if reviews are not online, people are not that interested in buying. 

Then there's a number of things that are quite important to customers. Customers who bought X, bought Y, recommendations, similarities, things like that. And then there's a bunch of nice-to-have, bestseller list. Now you need to have a discussion with the business about how much money should I spend on the things that are tier one, the really important one, tier two, tier three? And tier one, you might say, "Oh, I absolute need to replicate that over three different availability zones. Or I need 99.9% availability." Well, that's going to cost you. 

And tier two, you may say, "You know what? Three nines might be fine." If recommendations is offline for five minutes, we can handle that. And then tier three, now the bestseller list, it may go - well, if that's offline for an hour, I don't think anybody misses it. And so this is a conversation you need to have with the business. I mean, we reduce to as engineers to make those decisions. And then what we do is we make everything four nines available, which is, from a business perspective, way too costly. Because there are parts where, from a business perspective, you can kind of live it out even if for 5 minutes or 10 minutes, and it has a significant impact on the bottom line. 

There's all sorts of tools and tricks that you can think about to decompose your application in such a way. But the most important part of it is that you then take it to the business and have a conversation with them because they're the ones, at some moment, that will have to pony up the money for it. 

[0:21:19] KB: Yeah. Well, and as you mentioned before, requirements change as well. It may be that the bestseller list is not important when you launch it and you discover that drives tremendous amounts of purchases. And so that ROI goes up, and we need more reliability. 

[0:21:36] WV: Yeah. Or you build things that once you get it in the hands of the customers that they go like, "What?" 

[0:21:43] KB: Yeah. That's probably more common, right? 

[0:21:44] WV: Then it needs to be cheap enough. In retail definitely, we have this massive AB testing environment. And instead of hiring focus groups, and psychologists, and whatever before we build things, you might as well build it and put it in front of customers and see what they think about it. 

At some moment, we built something called your digital soulmate. This is the person on Amazon that is in terms of purchasing just like you. Wouldn't tell you who it was, but we will tell you what that person also bought that you didn't buy with the idea that that might be inspirational. Customers hated it. They hated the fact that there was somebody else just like them. 

Now, I mean, that's already 10, 15 years ago. Maybe with the changing social media and things like that, people don't care that much anymore. But it was much easier to quickly build it, put it in front of customers, and customers are going like, "Nope, that's not what we want." 

[0:22:41] KB: Thinking about that and knowing that things are going to sometimes prove out to be not actually valuable, how much cost alignment, how much architecture went into that before you built it versus building it to be evolvable if it turns out to work well? 

[0:23:00] WV: If you think about innovation at Amazon, I mean, everybody knows we're really built out of all these small teams that all take care of their own little world and things like that. They're all in charge of innovation. And some of these innovations, especially because our AB testing environment is so robust, aren't terribly costly. But there are teams that are working really hard on what may look on the outside world small problems. 

I don't know if you have ever bought shoes online. People who buy shoes often buy three pairs of the same shoe, different sizes, and then send two back. I think that's a bad customer experience, and it's not great for the business either. We have a small technical team that sits with the shoes team and tries to understand all of these kinds of things. And can we build a data set? Can we build data sets such that if these Nike 11, that particular Nike 11 fit you really well, maybe you should buy these trails for Adidas? Things like that. Does that cost a lot of money? No. But it may make a major difference for our customers if you can really give that good advice. 

And as such small teams are all charged with doing that kind of level of innovation. The bigger things go up to the central management team, of course. If it requires significant capital investment, then we have this principle that if we make major capital investment, the result of it needs to have a significant impact on our balance sheet. It means we're not necessarily interested in putting a lot of money on and then just getting that money back. But if it's really successful, it should be really successful. Probably, AWS is a really good example of that. But think also about things like the Kindle. That wasn't something that you expected Amazon to deliver. And remember, we sell the Kindle still at cost. That means if you never read a book on Kindle, we don't make any money. 

And also in the early days, one of the bigger challenges to customer service was that people would call in and say, "Ahh," and complain about something. And then the customer service agents would say like, "Yeah, but that's not by Amazon. That's by a third party." People go, "No, no, no, no, no. I bought this on Amazon." A significant portion of our customer contact all had to do with third parties and mostly with shipping, because people love to sell but they hate to ship. 

And so starting fulfillment by Amazon so they could put their goods in our warehouses and we would take care of the guarantees on delivery, major difference. Did it require a lot of capital investment on our side? Yes. Or think about something like Prime. Prime is not just a gimmick. We had to completely relay out a complete fulfillment network to be able to afford Prime. And Prime doesn't pay for itself. 

[0:26:12] KB: Prime is a really interesting example. What if we broke it down from the frugal architect principles, right? Like you said, it doesn't pay for itself. How does cost get incorporated into the architecture of what makes up Prime? 

[0:26:25] WV: If in around 2000, 2001, or something like that, you would have asked people, "Why don't you buy your lawn furniture on Amazon?" And they go like, "Lawn furniture? They sell books, music, video." Or your TV, or your electronics, or things like that. And one of the bigger challenges with Prime mostly do not only build a subscription model, but to make sure that people would understand that this would not only apply to books. It would apply to - if you buy a big screen TV, it comes free to you in two days. 

And as such, it incentives our customers to do cross-category shopping, especially for those items that are maybe big or costly in transport, or things like that. Now, it all comes in the same bucket. Now, of course, one of the things that customers really want is convenience. And we're no longer talking about two-day delivery. We're now talking about one-day delivery. Or where I live some part of the year in Dubai, there's two delivery windows. That is today before 6 or today before 11. If you get something the next day, you're kind of disappointed. And this is how people change on the influence of technology. Kind of things that we can do. 

But believe me, if you want to do same-day delivery in New York, you'll have to make some investments in making that happen. And as such, there's a clear thinking behind it. But most of these, also, Prime, they're experiments. Nobody's done that before. And things are not an experiment if you really know the outcome. And some of these experiments, just by the nature of being an experiment, need to fail. 

You had this phone. I don't know if you remember this, the Fire Phone. $800 million write off. Not all of these investments work out the way that we planned them to. And that's okay. And when Jeff explained this to the shareholder meeting, there was nobody better than I. Because in exchange for this non-miss, there are 10, or 20, or other big other success stories to make. And sometimes you make these gambes. 

[0:28:44] KB: Well, and as you say, you can't know how it's going to turn out. That actually, in some ways, feeds back into your frugal architect principles beyond design. You have this sort of measure and observe area where you say, "Okay, as we go, this is going to change." I'm sure Prime today looks very different than Prime when it was originally imagined. How do you keep track of what's going on and then evolve it as you go? 

[0:29:06] WV: Thinking about without measuring, you're flying blind. Whether that's around cost, or reliability, or uptime, or how are people changing their behavior under Prime. But I wanted to make actually one thing clear. In Amazon retail, you have the luxury of being able to experiment. You bring things in front of customers, they don't like it, stop it, or whatever. You adjust. In AWS, the world is different. Because as soon as we launch something, people start building their business on top of it. 

That's not something you can suddenly then pull the plug from underneath because people have been doing this. I mean, we're still running SimpleDB. And you can't sign up for it anymore. But there are a number of customers that are still using SimpleDB because they've tuned the hell out of it and know exactly how they want to run it. But we launched this. And the same is like with APIs. APIs are forever. It's one of the hardest thing to do is API design, because you need to think about sort of how is this going to evolve, because this is going to stay around for a while? 

And so measurement is extremely important, but also making measurement visible to everyone. And I often tell the story, and it goes back a long time. Most Americans don't know this story. But in 1972, there was an oil crisis. There were the hijackings and the thing with the Olympics in Munich with the hostage taking of the Israeli athletes there. And so, on Sunday, we couldn't drive a car. But also, a lot of research was being done. Why? Some houses used more energy than other houses. Although they were comparable, same family in it. It turned out that the family that was using more energy had their energy meter in the basement. The family that was using less energy had the energy meter in the hallway. That meant that every time, when they walked into the house, they got confronted with their energy users. And that changes behavior. 

I remember one of the first jokes I made, "And we go home at night, you can turn off the lights." Because as engineers, we're used to having some desktop underneath there and you just let it run. We go home. In the cloud, it's a much better idea to actually shut your development environment down because you're not going to use it anyway. It costs money. If you have that on a big monitor somewhere in your engineering environment and see how these things change up and down, it changes behavior. And so getting good insight also in sort of making changes between, "Do I want to run this on Intel? Do I want this run on a Graviton?", and things like that. Immediately seeing that your cost drops by 30% is a big motivator. 

[0:31:58] KB: Yeah, we are natural optimizers. Give us a number and we'll try to push it up or down, whichever one it is. 

[0:32:05] WV: It's a bit the same. We talked mostly about sort of if you have total control over how you building your applications from scratch, from beginning, green field. Most of our applications aren't like that. They've been around for a few years. People that developed them probably are not around anymore. But paying off technical depth is crucial in any organization. No matter how brilliant you were with developing your first version, I'm pretty sure there are some things to fix, some things to refactor, or some things to look at, "Where are my costs going?" And does that meet the intuition of how much this should be costing? Before cloud, I remember there was this moment where, I think, we had 12 different search services within Amazon. Don't ask me why. We don't centralize those things. But there are 12 different ones. 

[0:32:55] KB: It's a natural outcome of experimentation, right? Experimentation leads to diversification. But then what do you do? 

[0:33:02] WV: As a team, you're allowed to move fast. And if you feel you're being hampered by another team that still has to complete something, you just go do it yourself. I don't know if you realize, but there has been, forever, a button in your orders that says digital orders. That was mostly because integrating digital into the traditional order pipeline was way too much work. And so we allowed that team to build their own pipeline. Not the best customer experience, probably. But we were talking about something else. What was that? 

[0:33:30] KB: Technical debt. 

[0:33:32] WV: Oh, technical debt. Yeah. And so you know there are certain engineers who love to tinker, who love to - I'm not showing every engineer to be the same. Some engineers love to babysit an SAP system. And they are conservative. And they make sure this thing will run to the max. Absolutely. Those kind of people you want to hire for that. 

There's also engineers that would love to think and do some innovation here. You ship them off to somewhere else. But there's also a group of engineers that really love to look at the minuscule things that make things better. And if you can build a team out of these people and have them just go around the company, go look at things. Can we see? We have so many optimizers. We have so many deep insight into the execution of these things. All these flame graphs and things like that. And where's all this compute going? It shouldn't be going anywhere. And they find gold. 

And so that's why I think it's - and it's obviously saying perimeter optimization is not a good plan and things like. It doesn't mean you shouldn't use your brain up front when you're actually building something and not shove this up to a later moment, "Oh, we'll check at this later." And then technical debt is like the mortgage on your house. If you don't pay off your mortgage, the bank comes and repossesses your house. If you don't actually eventually solve all your technical debt, it'll come back to haunt you. Whether it's in reliability, in cost, in performance, it will come back to haunt you. It is a worthwhile effort to put some engineering against. 

[0:35:20] KB: It's interesting thinking about technical debt in the context of what we've been talking about today in terms of aligning cost. I think as a business goes through these different phases, as you do experimentation, as a product goes through those phases, you assume technical debt because you're optimizing for different things, right? You may not optimize for cost upfront because you think there's a 70% chance this thing gets thrown away. 

[0:35:45] WV: I have a really pretty good example of that. When we started Amazon Fresh, we had no idea how the interface was going to look like. How did people want to interact differently with the Fesh interface versus, let's say, the normal retail interface? Yeah, of course, they want it. And with subscription mechanisms and things like that. But you wanted the team that actually was building this to be really agile, to be able to move things around fast. So they started off in Ruby on Rails. Why? Because it's a great prototyping environment, good visuals, you can do this. But they knew on day one that the moment that they needed to start scaling, it needed to be rewritten. Let's put it like that. Because it wouldn't be able to - or it would probably maybe able to scale, but it would be at an enormous cost. And bringing it back to the normal Amazon principles, we decided to go with Ruby on Rails first because we didn't know how things were going to look like, and it was a great prototyping environment, but then you do need to pay a few technical debt eventually. 

[0:36:45] KB: I'm feeling exactly that right now in my day job, by the way. We're paying off a Ruby on Rails technical debt issue. 

[0:36:52] WV: Yeah. What did I see yesterday? There was the - I mean, Stack Overflow is not that popular anymore, but they do have this survey, I think, which I pay attention to. And the drop in Ruby developers seem to be something like 75%. There's quite a bit of movement in that, which I always like because I think if there's one thing as engineers that we're always forced to do is learn new stuff. And it's the cool thing, I think. But you do need to do that. 

[0:37:24] KB: Well, and I think programming languages is kind of at an interesting moment right now, especially with all of these LLM-assisted coding tools. And learning a new programming language is probably easier than it's ever been. 

[0:37:36] WV: Yeah, I do think so. 

[0:37:39] KB: Yeah. Yeah. No, I think learning anything these days, you can get a little assistant on the side that will help you. It would be great if the assistant would actually really be able to track your progress really well and sort of start suggesting which things you should be paying a bit more attention to. But I'm pretty sure that will happen in the future. 

But one of the things with respect to programming language and the frugal architect, let's come back to that one, I ended the frugal architect presentation with a quote from Grace Hopper, from Admiral Grace Hopper, our famous first programmer. And she says the most dangerous word in the English language is we have always done it this way. And I think that goes to often how we treat things, "Oh, I'm a Java programmer." Or, "Oh, we're really good at Rails." Or, "You know what? This looks just like the project we did last year. Let's just repeat that." 

And especially when it comes to, I think, sustainability and cost, we should really re-evaluate which programming languages we're using. If Python and Ruby are 75 times as inefficient as Rust, that maybe there should be some light bulb going on in your head and go like, "Well, maybe if cost is important to us or if sustainability is important to us, maybe we should investigate. Now, I know learning Rust is not the easiest thing to do. Now, here you have a compiler that is so picky that you start to hate it. Everything you get in exchange, blindingly fast, superior security properties really make it within Amazon at this moment the programming language of choice. We are rewriting significant portions in Rust. 

Senior principal engineers wrote a great article on my blog about how they're thinking about why they moved from Java to Rust when they were building Aurora DSQL. And security plays a very important role in that, but also efficiency and cost. The thing here is that there will be another programming language after Rust that is even more efficient and has even better security principles or whatever, as engineers, we'll be learning the rest of our lives. And I think that's fun. 

We as engineers, we have the most amazing jobs in the world. We can go to work every day and create something new. Who else can do that? Nobody. And we have the most - we are the artists. We are the creatives. Doesn't look like that to me outside. 

[0:40:21] KB: Yeah. Well, and this brings back, I think, a little bit to something we were talking about in terms of paying off technical debt and different languages maybe being appropriate for different phases of a project. How do you set up your project to enable you to say move easily from Ruby on Rails over to a Rust-based project or something like that? I know you've talked before about evolvable architectures. What does that actually look like? 

[0:40:44] WV: Evolvable architectures is a little bit different because - well, let's take Amazon S3. We started off with six microservices that was the whole thing. Now it's well over 200 or over 300. I forgot what the number is. Danny can probably tell you that. Everything that we added, but we never took S3 down. It wasn't that we're changing software or things like that and send an email to all of our customers saying, "Sorry, but Friday evening between 6 and 10, S3 is down." Well, probably your TV won't work anymore and you can't get coffee. And I wouldn't be surprised if the beer taps don't work anymore. But always evolving your software because you know that a few years from now, you will be running a completely different architecture. You think differently. 

S3 is a great example. In version one, we were just storing three copies of an object on different servers. And then we started realizing that erasure coding actually could help us here and actually still get the same level of durability, but you needed to store less data and significant cost improvement, of course. Yeah. You can introduce that. It doesn't mean you're going to recode all the old objects. Maybe once you touch them, you recode them, but you just leave them there. 

And so we know that sort of over time, yeah, we went to different storage model where you would only store in two servers, or in one server, or add all sorts of functionality. Now these days, you can store your factory embeddings in S3. Evolvability is really realizing that you know this is not going to run forever, but we need to build it in such a way that, for our customers, it looks like it will run forever. It can't go down. In terms of evolvability is crucial for us. 

And in terms when you think about technical depth, when you think about the example that I gave with multiple programming languages, sometimes what you do - in Amazon, we're organizing small teams. Each team has ownership over the piece that they have control over. They can make decisions. There's agency there. If you want to do something that may spend some of these teams, what you do is you fire up another team. The other team has a task either coordinating or maybe do some of the work. Because remember, these teams are relatively small. They weren't sitting around waiting for more work. They had their road map planned out for them. 

But you may actually bring a team on board that whose task it is to start carving up pieces of fresh, rewriting them, making them as a service. So we can actually call them. We don't need to worry about how they're implemented. You need to play around a little bit with your organizational structure as well to get these things done. 

[0:43:41] KB: Yeah, that makes sense. Well, and I think the carving off and the balance of the API is forever. The contract with the customer is forever. The implementation, you can't fall into the we've always done it this way trap. 

[0:43:53] WV: No, because there is always something new. I mean, sometimes you build things that are intuitively - what you find intuitively the right way. There's a great website. It's called the Amazon Builders' Library. And there's a whole bunch of - how shall I say? Nonobvious thinking, where there's a great one by Col McCarthy, who is about constant work. And when you read it, you think like that's inefficient. But in terms of stability of a system and other principles around system, it looks great. 

And so, that's a great document where things are being traded off against each. A bit more bites on the wire, a bit more work to be done, but the system itself is rock stable. There's a lot of work you have to do, but you might play around with your organization to actually get that done. 

[0:44:49] KB: Yeah. No, this makes sense. Well, we've covered a lot of this. We kind of gone through all the different pieces of the frugal architect. We're getting close to the end of our time. Is there anything we haven't talked about today that you think would be important to cover before we wrap? 

[0:45:02] WV: On this particular topic? No. I think we've done well. But as always, I want this to be practical. I don't want this to be just some high-level architect that never guts his hand dirty and suddenly start talking to you about cost. I think important is the relationship between the business and tech, because that's where money matters. And so if you keep that in mind, a bit of an agile, working together with the business to make sure you build the right things. And requirements change. That's not new, by the way. 

In the 1990s, there were enough reports already about changing requirements is why all these big projects fail. And actually, that is still the case. I do believe that evolvability centers around decomposition. Decomposition model services, but also then decomposition of the application into cells, such that you minimize the exposure to failures. And there's so many different technologies to think about when you build your system that I'm still having a great time. 

[0:46:11] KB: Yeah, I think there's no shortage of things to learn and to do in the engineering world. 

[0:46:17] WV: Absolutely. 

[0:46:18] KB: Awesome.

[END]