EPISODE 1631 [EPISODE] [0:00:00] ANNOUNCER: SQL databases were built for data consistency and vertical scalability. They did this very well for the long era of monolithic applications running in dedicated single-server environments. However, their design presented a problem when the paradigm changed to distributed applications in the cloud. The shift eventually ushered in the rise of distributed SQL databases. One of the most prominent is CockroachDB, which uses a distributed architecture inspired by Google Spanner. But what were the engineering approaches that made this architecture possible? Jordan Lewis is a senior director of engineering at CockroachDB Cloud. He joins the show to talk about the design of CockroachDB and how it works under the hood. This episode is hosted by Lee Atchison. Lee Atchison is a software architect, author, and thought leader on cloud computing and application modernization. His bestselling book, Architecting for Scale, is an essential resource for technical teams looking to maintain high availability and manage risk in their cloud environments. Lee is the host of his podcast, Modern Digital Business, produced for people looking to build and grow their digital business. Listen at mdb.fm. Follow Lee at softwarearchitecture insights.com and see all his content at leeatchison.com. [INTERVIEW] [0:01:27] LA: Jordan Lewis is a senior director of engineering at CockroachDB Cloud, and he's my guest today. Jordan, welcome to Software Engineering Daily. [0:01:34] JL: Hi. It's nice to be here on the show with you. [0:01:36] LA: Thank you for being here. So, what is it that makes CockroachDB special? Why are you able to solve the scalable distributed SQL problem without simply ignoring SQL? [0:01:49] JL: What makes CockroachDB special is the architecture that we've chosen for the database, allows us to take familiar, regular old traditional SQL, or SQL as you'd like to say it, and distributed across a cluster of homogeneous nodes. We can scale your database horizontally in a way that is more familiar for someone who's maybe used to using DynamoDB than an oracle or a Postgres. How does it work under the hood? The idea is, it's really quite simple, conceptually, is that we take your data. We split it up into ranges, so chunks of contiguous data, and just like you might be used to, if you're used to using a sharded SQL system that where you're manually deciding where those shards live inside of your data, CockroachDB is doing that shard computation for you automatically under the hood, and abstracting away the fact that you might have to separate your data up into shards. It allows you to use SQL just like you're used to, the database manages deciding when to ask a particular shard for a bit of data, or a particular range for a bit of data, like we say in Cockroach, and it decides when you're going to write to a particular range of data in Cockroach, without having to expose that complexity to the user. That's the fundamental idea. [0:03:03] LA: We're going to get deep into how it works as we go along, and that's great. I want to know more. But first of all, tomato and tomato, I always get this debate too. I'm sure you do all the time. Is it SQL or SQL? You say SQL mostly? I say it both ways. But I think I usually do SQL, but it's amazing what we can debate about sometimes. [0:03:25] JL: Exactly. I think with that one, I'm just copying what our founders do. I joined the company back in 2016, and everyone was saying SQL then, so it kind of stuck in my head. That's what I say now. [0:03:34] LA: Makes perfect sense. Yes. As you know, and full disclosure to everybody who's listening, I focused personally on architecting for scalability, and I wrote the book Architecting for Scale by O'Reilly Media. In fact, CockroachDB as a sponsor of that book. So, I know some of the things that Cockroach goes through and you're aware of some of the things I talk about. So, our interests are very well aligned. We both focus on scalable architectures. But let's get into some of the more salient details of how this works. Many people have tried, or there's lots of different approaches, let's put it this way, to try to scale SQL databases. But most of them involve some form of limitations you put on the way you access the data. We might be talking about requiring single writer, multiple readers, as a way of scaling your read-only part of the data solution. There are other solutions as well, too. But your approach, all nodes can service both reads and writes simultaneously at a scalable solution. So, how do you synchronize your rights, for instance, to make sure that you scale, yet you maintain the consistency that SQL demands? [0:04:50] JL: It's a great question. Ultimately, it comes back to this decision in the architecture to use ranges, as we call them. So, shards or ranges. It's the same idea. Let's pretend you're a shopping cart application, which is a really good and classic use case, for a system like CockroachDB. You need that consistency, because you've got two people trying to check out the same item, at the same time. It's a classic interview question probably, and you only want to make sure that you're selling one of them. You don't want to commit a transaction that says, "Okay, user A got to buy the widget. And user B also got to buy the widget, but there's only one widget." Ultimately, it's key that you can satisfy these consistency requirements, at the same time as allowing that shopping cart application to scale to the millions of users and thousands or millions of whatever, transactions per second that your application is trying to serve. So, how does it work? Ultimately, if you kind of think about what you might have to do as a database. You have two users touching the same shopping cart data. You're going to have to think about as the database, about how are you going to resolve that conflict. Whereas, maybe you have two users who are touching two completely different pieces of data, different items in that shopping cart, or in your store. You won't have to worry about having those two transactions talk to each other. So, what does CockroachDB do, or generally horizontally scalable systems do to allow the system to only worry about those conflicts. It's about taking the data and making sure that the potentially conflicting bits are close to each other. So, in CockroachDB's case, it means that you've chosen your schema in a way that might say, shopping cart item IDs are in a table. Your primary key is organized by item ID. Something like this. If two people are touching that same item ID, it's going to be processed ultimately, inside of this machine and inside of the system by the same machine. Once you're inside of that same machine world, the way CockroachDB does consistency is it uses Raft. So, Raft is the ultimate transactional consistency algorithm. That's your consensus protocol that's used to adjudicate between replicas of the same range. If you're touching that same item in the shopping cart, at the same time, two people at the same time, Raft is going to deal with adjudicating those conflicting transactions. [0:07:06] LA: In other words, you use traditional sharding. Now, it's automated charting. So, it's invisible to the customer. But you use traditional sharding to limit the scope, so that it's rational for both rights to go to the same machine and have it dealt with simultaneously. [0:07:21] JL: Exactly. [0:07:23] LA: That's great. That works well, probably safe to say that works well for about 99% of the problems you run into and 99% of all synchronization issues can probably be resolved with things like that. But there are a few things that can't be. If you look at full transactional awareness with what SQL allows, you can't localize data like that, and be able to get full transactional awareness. So, how do you handle that? Do you allow full SQL transactions? If so, what do you do to make sure that a full transaction that's now touching 50 pieces of data on 40 different machines along with one piece of data that's being touched by another transaction on another 50 machines by someone else, and how all that coordinates together? [0:08:14] JL: So, I love the way that you took the micro and then zoomed out to the macro. The micro is Raft consensus, exactly. And then the macro is sort of a traditional two-phase commit transaction protocol. So, we do support any kind of transaction that you can throw at SQL is supported inside of CockroachDB, and if ultimately, that transaction does need to touch more than one consensus group or range, then the system notices that inserts, two-phase commit style transaction records in the places that they need to be, and orchestrates the transaction across as many consensus groups are required to make the transaction go through. [0:08:52] LA: You write the records, the journal record is necessary in order to do the two-phase commit. And then you ultimately have an event that's broadcast that does the actual commit. That's the only thing that has to be transactional in nature at the lower level than that two-phase commit signal that says, "Commit the records." [0:09:11] JL: Yes, exactly. We kind of take those transaction records or journal events, whatever you want to call them, we call them transaction records. Those records, they're really just data items, like any other data item, according to CockroachDB. So, you're inserting this transaction record. To CockroachDB, it could be just another user record. But it's got a special bit that allows the system to know. Oh, this is the thing that I need to go to, to resolve the two-phase commit, or check to see if it's been committed, or there's all sorts of fun edge cases. What if a writer goes away or the machine dies? How does this stuff all get cleaned up? There's a protocol. I think, this is actually one cool thing that we've done, is this particular part of the system. I believe we've verified with one of the - forgetting the name of it right now. But one of the kinds of protocol verification - [0:09:55] LA: Yes, protocol layers. [0:09:56] JL: Exactly. That's a cool thing. [0:09:57] LA: Cool, cool. So, you're fully ACID-compliant. Is that correct? [0:10:01] JL: Yes, exactly. [0:10:03] LA: At a macro level? [0:10:04] JL: At a macro level, and we do support kind of a Serializable level of isolation, which is something that we're pretty proud of. We think that for most developers, it's actually an easier idea and easier concept to think about making all the transactions Serializable instead of a lower level of isolation, like READ COMMITTED, for example. So actually, for many years, that was the only isolation level that CockroachDB supported. Unusual choice, I would say. We kind of, in the early days thought, we're going to revolutionize databases by making sure everybody is using this Serializable best consistency. Developers are going to have an easier time. As it turns out, I think the market made us realize that there is demand for having a looser level of isolation as well. So, we are introducing READ COMMITTED in an upcoming release that we're excited about that as well. [0:10:50] LA: Cool, cool. Good. I'm anxious to see how that works, and how that all fits together. Because I remember I was working at AWS and I don't remember the guy's name, nor what I say his name, but he's a well-renowned database expert, traditional database expert that happened to be working at AWS, and he'd worked at Microsoft and other places as well. He said, "Scalability and SQL do not go together. You cannot solve the scalability problem with an SQL-compliant database." And what he meant by SQL-compliant database is a full ACID. Everything that goes with that. I never believed them. So, I'd love to see companies like Cockroach prove that this actually works. Are there any compromises that you do make to SQL or to ACID compliance? ACID compliance is not the best example. But are there compromises you make, maybe it's slower than a normal SQL database would be in certain circumstances or whatever. But other compromises you have to make in order to make this actually work in a distributed fashion? [0:11:58] JL: There's a few kinds of compromises that we've had to make. Certainly, you mentioned latency. It's hard for a distributed database of any nature to compete with a single-node database latency. Just doing the extra network hops, doing the consistency round trips, that's going to introduce an unavoidable amount of extra latency, for sure. I think another limitation that - it's been an interesting one for us to kind of navigate, has been related to schema changes. So, one thing that a lot of the old-school SQL databases do offer you along with their ordinary ACID compliance for data transactions, is they also offer fully transactional schema changes. That's something that's very tough for us to offer. Because we also want to offer these kind of online schema changes, which is something that our customers really like if you're operating on large-scale data. You have an online shopping cart, or a gaming application, or a financial transaction log that's happening in real-time 24/7. You don't want your system to be interrupted by a schema change. You don't want to have to take your application down. Obviously, you can't do that. Over the years, many people have come up with workarounds for this, for different systems. We've decided to build that online schema change capability directly into that schema change system, which is great. The trouble is that if you want that system to also be fully transactional, in other words, you want to be able to create an index at the same time, as editing a row, or add a column and edit an enum all at the same time, we don't have the ability to roll that back atomically right now. That's a limitation. Our customers have made that known to us that that can be trouble. But maybe, we'll find a way to solve that limitation one day. But for now, that's kind of one of the limitations that does come to mind, for me. [0:13:48] LA: It's a good window to bring up, because what I like about it is you're right, it's truly as a limitation from standard stock SQL, and it is something that is valuable to have. On the other hand, it's also - I might get myself in trouble here with some of my customers or some of my clients. But it's bad protocol to require that in a production system. You should design your schema changes in such a way that the impact is a micro step-by-step change, where you don't have to roll back transactionally 10 levels of schema changes, and the first sign of trouble. You can design your schema changes. It's possible to do it in almost all cases, so that you can change your database small scale at a time, non-transactionally, without worrying about downtime. It's possible in most cases. But I'll get in trouble with some people for that one, I know. But what's your thoughts on that? [0:14:46] JL: Exactly. I think you're so right. There's a lot of best practices in the sequel world that you would really love if all of your customers followed. I guess is the way that I would put it. One thing that I've learned throughout my time at Cockroach Labs, my evolution as a technologist, I guess, you could say. For a long time, I really thought that we could convince customers to do it right, so to speak. Maybe that's using the right type of schema changes. Always using Serializable. Never making foreign keys that didn't make sense. Over time, I realized that there's so many reasons. It's almost a Chesterton's fence kind of situation. There's so many reasons why people are using SQL in a way that you didn't think of. Maybe this is actually what your friend at AWS was saying about you can never make SQL scale, because there is so much variety. In the different ways that people use SQL, there's all valid reasons for this. You may not understand them all yet. But people use SQL in really, incredibly creative ways. I think as a service provider, you will be limiting yourself, and you won't be doing your job to solve those customer problems if you end up thinking, "Ah, if only they could stop using stored procedures. Ah, if only they would only use Serializable isolation level", things like that. [0:15:55] LA: I think it's probably safe to say I guess, I don't know for sure that this is a true statement, but I think it is. That SQL is probably either the oldest or one of the oldest languages that's still heavily in use today in programming languages. So, it's got a long history, meaning that people were using SQL for radically different purposes when it first came out. That's where they were learning SQL, and that's where the restrictions and what SQL means were put into place. If you look back 50 years ago, how SQL was used, versus how it was used today, it's not a surprise that, and there are lots of databases that do things, not the way we would do it, if we built it from scratch today. But are doing them in very important ways, not only because of the fact that data already exists, there has to be access that way. But also, those programmers who act that way with decades of experience, that understand how to program in that way, and that's what they're used to doing. That's one of the problems, I think, with no SQL databases in general is, yes, they may be "better", they're just different enough that the way we think about building applications has to change. And that's hard for a lot of companies to do. You just can't do that. That's why things at Cockroach are so important, and why I imagined as full of compliance with SQL standard, which is nebulous term, is very important for your business. You can't be 90% there. You've got to be 100% there. [0:17:33] JL: That's exactly right. So true. We've learned so much about exactly that word. What is SQL compliance really mean at which company? It may be different depending on who you ask. Are we talking about Postgres? Are we talking about Oracle? Are we talking about SQL Server? I think we, as an industry, have not done the best job just yet, of finding that single standard that people can conform to. I don't think we're really quite there. Maybe one day. Maybe Postgres will become the standard. That would be a dream for me, personally. [0:18:00] JL: Yes, unfortunately, they'll probably end up being a couple of standards, right? If history works out, yes, there'll be standards, but there'll be this one or that one, and they'll be radically different. And the compatibility won't be perfect. It'd be just like browser wars. It actually already is. What am I saying? Well, we talked about transactions. But the other thing that makes scaling in SQL hard is JOINs, and how JOINs work. The fact that you can JOIN anything to anything, in any combination, in highly complex ways, "efficiently", relatively speaking. How do you do that? How does, either Cockroach's architecture helps make JOINs easier, or makes them harder, or challenging for you, or what? How do you do JOINs? [0:18:46] LA: We've taken a lot of best practices from other distributed SQL systems that have been developed over the years. There's several of them. I think, the one that I've spent the most time learning about is a system called F1 that Google developed. F1 is a system that - you can think of it as a distribution layer on top of a distributed SQL, database, or any other kind of database under the hood that you could implement JOINs on top of. So, inside of CockroachDB, the SQL system, it has a similar layeredness to it. There's an underlying sort of standard execution engine, type of thing, that you can think of is the part that runs on the nodes that have the data. So, let's say I'm doing a distinct operation, or a select, or a filtration, or even a JOIN between two tables on the same system. You can think of these operators that they're kind of just running on the node that you're asking the data from. What happens if you ask to do a JOIN that needs to get data from more than one range that are not co-located? What does the system actually do? You could imagine, well, you could send one of these operators. Let's say it's a JOIN operator and it wants a little bit of Table A and a little bit of Table B. And the Table A bits are located on the node that you're running to JOIN on. But the Table B bits are somewhere else. So, you could imagine, that node could send an RPC off and start pulling in all of the data from the other table. And the node is streaming all of the data. Maybe you did a little filter push-down, maybe you didn't. But in any case, it's maybe a bit expensive, since you're having to send all this data over the network. So, the insight of F1 is that you can actually distribute these processing operators across the whole cluster if you want. So, you could make a hash JOIN, or a merge JOIN that is composed of many little sub-hash JOINs or merge JOINs, or a distinct that's composed of many little distinct. Or you're taking filters, and you're asking them to run on different nodes. Really, that's fundamentally what our system does. It's similar to this F1 idea. We call it disc SQL. That's just an internal name. And the node that's doing the SQL planning. It can create a distributed plan and ask the nodes remotely to run the subcomponents of the plan. So, not really too different, probably from what other distributed SQL systems do. But this is the way that ours works. [0:20:59] LA: Cool. Cool. So, there's a lot of chatter that goes between nodes. Is that correct? [0:21:04] JL: Absolutely. Yes. [0:21:05] LA: So, how do you handle geo-distribution with so much chatter going on? Or is there a different algorithm that works for geo-distributed nodes versus local nodes? Or how do you do that? [0:21:17] JL: So, geo-distribution, what is that about, really? Just at a high level, zooming out, the way that I think about geo-distribution is saying, we can have a distributed SQL database that lives inside of three data centers that are only a few milliseconds from each other. Or we can decide to say, we need to have replicas that live not only in US East, but also in US West, and also in Europe. When we have a system like that, we call that, that's the geo-distribution part kicking in. So, how do we deal with it? Under the hood, if we were to say, "Well, I'm a user of CockroachDB, and I'm just going to send you all of my data. I'm not going to give you much instructions about what piece of data is supposed to live where. I think we would struggle as a system, just as you say. There would be a lot of that chatter. A lot of that chatter would have to go over the cross-region links, which are expensive and slow and this and that. What a system like Cockroach allows you to do, the way that we've chosen to allow you to organize your data by region, you can configure as a user a table to be regional by row, as the way we call it. What that means is that the database can look at a component of the row, let's say it's a user ID or a region code, and it can decide based on an algorithm that you pick whether that data should be homed in a particular region, or another particular region. Not only in that region. So, it's not like a traditional sharding system where that data only lives in that region, unless you want it to. But by default, it's going to be homed in that system and replicated across other regions for redundancy. Once you've got that all as a basis, I'm a user, I've configured my system to be regional by row in the right places, then the system won't even need to reach cross-region, if it doesn't have to. The nodes that are serving the data for any particular region will also be next to each other. Most of the time, when you organize data like this, you don't have to end up doing any cross-region JOINs. If you do, you do, and then there's going to be some chatter, to your point. But that's kind of the high-level idea. [0:23:14] LA: That's great. That actually also helps with durability, which is going to be one of my other questions about the redundant nodes and all that sort of thing. And sounds like the answer is yes, at least across regions, and maybe otherwise. But then, that also comes back to the question about transactions and transactional integrity, and how do you deal with that. When you have two transactions, one in Europe, and one in the US touching the "same data", even though it's not the same nodes, because they're replicated. How do you handle transactional integrity in those cases? [0:23:45] JL: Yes. In those cases, it's going to be similar to what we were discussing before. There's going to be that two-phase commit. There's going to be that transaction record. And ultimately, if you're trying to touch that same bit of data, a shopping cart data across US, across Europe, we're both trying to buy the same item. It's going to be slower. There's really no way to avoid that, unfortunately. The speed of light - we had a little phrase earlier on the company that the speed of light is that fundamental limit for CockroachDB. [0:24:11] LA: That makes perfect sense. What's nice about what you're dealing with, when you talk about adding capabilities like geo-replicated transactions, you talk about moving from you can't ever do this to you can do this. So, it's not tend to be as critical of a question about the performance of it. It's understood that a transaction that has to go across four continents. It's going to take longer than one that goes across four millimeters of a wafer. That makes a lot of sense. But it is something that the choice that you make is ACID compliance is more important than performance when it comes to those sorts of situations. So, you don't break integrity. You may break performance or software performance, but you don't break integrity. Is that, in general, a good philosophy, or is that in general, I should say, a good description of the philosophy of CockroachDB? [0:25:10] JL: I think that's a good description. We think of CockroachDB as a database that's fit for your tier zero system of record transactional workloads. So, these are the ones that we're talking about the shopping cart that can never be inconsistent, or more, a better example would probably be a financial system ledger. This is a system that you cannot afford to have an inconsistency in the system. So, that's the prioritization. If you're thinking about CAP theorem, it's not exactly a CAP theorem thing. But it's the same kind of idea. You want to prioritize that CP, that consistency over the availability, if you want to work on some kind of transactional finance system, for example. [0:25:48] LA: Let's get back to the durability aspect too, which we touched on a little bit with the geo-distribution. So obviously, geo-distribution gives you durability. Do you also offer durability options within a single region where you can replicate data, and make it look like it's essentially multi-region from the standpoint of the durability standpoint? Is that the capability you offer? You're shaking your head. But for the audio, is that something that you do? [0:26:11] JL: Absolutely, yes. So, we do provide a kind of configurable replication factor solution like you're - [0:26:18] LA: That was my next question is. [0:26:18] JL: - used to, a no SQL kind of system. So, you can decide to say, as a user, I would like to have three replicas balanced anywhere, that's maybe a default, if you're using a single region, CockroachDB. You can say, "My range is going to be replicated three ways somewhere." Or you can say, I wanted to be five-way replication because I'm interested in extra, or seven-way, or whatever. And if you're multi-region, those kind of constraints that I described, the regional by row kind of thing, you can ask the system to do anything, really. It's quite configurable, and I think early on, we found that that level of configurability, maybe a bit too much for most users. So, we do provide some nice defaults and some sane defaults and things like that for most people. But if you really want, you can say, "I want five replicas in Europe and three in US", or whatever kind of other combination you're interested in for your applications needs. [0:27:08] LA: Yes. That makes a lot of sense. A lot of people talk about durability of databases versus backups, and the importance of durability as the replacement for backups. What's your philosophy in general about database backups, versus a durable database? [0:27:27] JL: Yes. So, it's a deep question. We think of Cockroach as a system that allows you more peace of mind than a standard distributed database or a standard single-node database, because of this extra durability provided by replication, and cross datacenter replication, cross-region replication. But to me, and to us, it doesn't obviate the need at all for disaster recovery, for backups. Organizations use backups in a lot of different ways. They use them as mistake rollback machines. Let's say someone comes in, deletes your whole database. You can use disaster recovery for that, even if you know your system already replicated that mistake to all of the copies of the data. Or your organization probably needs this for compliance reasons. Even if you don't end up using those backups, there's lots of good reasons why there are laws that require organizations to take these backups. Or your organization might use it for annual risk exercises. What happens if there's an internal attacker inside of your organization that takes down that database and deletes all the copies or something like this? So, by no means does the extra durability and extra replication take away the need for good old-fashioned standard backup disaster recovery. [0:28:43] LA: Or for that matter, vendor issues. If you accidentally delete a record, whether it was customer's fault or your fault, it doesn't really matter. If that record is completely and totally removed, there's nothing you can do about it anymore, and that's only a backup that's offline and separate from the system would solve that. [0:29:01] JL: Yes. Exactly. That's that defense in depth. I should probably mention, test your backups. This is just general advice. For all the listeners out there, please test the backups. [0:29:10] LA: One, two, three. Exactly. Yes, exactly. So, let's talk a little bit more about the sharding mechanism. You say you use sharding as a technique for doing the distribution, which is a standard age-old approach, except as you also mentioned that most sharding solutions are customer-managed. So, customers have to deal with designing what the shard key is, and then dealing with the consequences of having to recharge, and all those sorts of things. Now, you do all that automatically for the customer, which is fantastic. But how do you do that? What do you do to make the decision on how to shard, what the shard by, how much control do you give your customers, and how exactly this mechanism work? [0:29:52] JL: Under the hood, you can think of the CockroachDB architecture as another layered architecture. I was talking about how the SQL processing is layered, so is the storage. You can think of the SQL system laying on top of underlying giant key-value system like Dynamo. So, under the hood, that key-value system, it's a giant ordered map of all of your data, and we've decided careful scheme of translating a SQL row into key-value pairs. For example, if you're a table, maybe for users. I mean, you have a secondary index on their email addresses, for example. Primary key is their first name, last name, or something, and you've got an ID, you've got an email, whatever. You're going to actually have three underlying key-value sections of that giant KV store that has the first index, the primary index of first name, last name, kind of a bad primary index, by the way. That just kind of came out of my mouth. I mean, you've got a second one for email, and they have sort of separate spaces in that key space. And ultimately, when it comes to the sharding strategy, we call them ranges, sharding strategy, whatever. By default, each of those tables, and each of those indexes is going to get their own little partition inside of CockroachDB. So, those are going to all be by default, their own replication groups. When those replication groups become large enough, and there's a heuristic about the size of the range, in terms of megabytes. I forget what the number is now. The system will automatically split that range up into multiple. A similar heuristic comes about when we're talking about QPS. A particular range is getting a lot of QPS, a lot of traffic on it, the system might decide, "Oh, I'm going to split this up to make sure that the system can actually uphold that horizontal scalability that we were talking about in the beginning. If you only have one big consistency group, then it's just going to be hard for the system to actually scale it out. [0:31:42] LA: So, you're actually making a sharding decision based on the define SQL indexes? [0:31:48] JL: Absolutely. Yes, that's a super key property. So, to your point, it's a little bit misleading, I guess, to say, "Oh, it's fully automatic, and has nothing to do with user decisions." That's not really true. The user, you do have a lot of ability to change how the system does the sharding. And it's quite important that you keep this stuff in mind. You don't have to worry so much about the things like, "Oh, okay, well, I've got 128 shards, and I've got to split it up into 256. But I'm worried about XYZ." I've lived that life at my lifetime. You do still have to worry about things like you've got an index on an ascending key, for example. You're doing an increasing ID, just one, two, three, four, five. Systems like Cockroach tend to struggle, since it's most likely that you're going to be writing to the end of that index all the time. And a system, it just won't be able to scale unless you have the ability to shuffle that hot area of the ranch out to multiple replication groups. So, you can avoid that using other techniques like adding a hash in the beginning or using a UUID, or whatever the case may be. [0:32:44] LA: Got it. This is great. So, let's start talking about the cloud approach to CockroachDB. Now, you specifically are the director of Cockroach Cloud. You have both an on-premise version of Cockroach and a cloud version. Talk about the differences. [0:33:00] JL: Sure. So, we offer CockroachDB in kind of three different configurations right now. We offer it as a self-hosted product. So, you can run that on-prem or in your own cloud, and kind of managing it yourself. You can purchase it through Cockroach Cloud. So, if you go to cockroachlabs.cloud, you can buy two types of cockroach database, a dedicated CockroachDB, which is, it's just dedicated hardware. You can think of it that way. But it's got all of - it's got multi-region, you can have it live in Azure, you can have it live in Google Cloud, you can have it live in Amazon, or you can purchase it as a serverless deployment. In serverless, you're using shared hardware, and you're purchasing kind of request units, instead of purchasing things like provision capacities. This is nice for bursty workloads. It's nice for getting started, and it's also really nice for being able to have a workload that you - it's a bit less predictable. I'll put it that way. So, those are kind of the three ways that we offer Cockroach right now. [0:33:56] LA: So, the whole category of distributed databases in the cloud has been growing dramatically, right? Databases like DynamoDB have been growing in popularity. Aurora, we talked about SQL databases. And obviously, databases like CockroachDB are really designed for the cloud, really, because of the scalability aspects. Why do you think there's been such a growth? I mean, this is almost too much of a leading question, I guess. But let me ask it anyway. Why do you think there's been so much growth in distributed databases for cloud offerings in recent years? And do you see databases like CockroachDB, being able to fill those needs as they grow? [0:34:39] JL: Yes. So, I think the reason for the growth in need of distributed database, it follows the growth in the Internet. People want to build Internet-scale applications more and more. That's almost the default, from my perspective. If you're a startup, you don't want to aim your sights at a system that only serves your local city or something like this. You want it to be Internet scale from the get-go. I think, because of that, people are really expecting that their database will be a system that scales along with their user base, more and more. I think there's some wisdom about you should start with simple technology that's well known. You should start with something that's just really rock standard. I think that's fine and I think it doesn't make a lot of sense. My dream for our industry would be that over time, that distributed system database will become that standard first component that you reach for, simply because it's a system that allows you to scale, whatever your end goal is, right? You can keep it small, you can get really big, and you don't have to worry about doing a migration halfway through, when everything's on fire, and you're worried about keeping up with the demand. [0:35:42] LA: You absolutely hit head-on where I was trying to go with that series of awkward questions that I couldn't get out. And this really is in the concept of, I build several systems, that's SaaS-based applications that do blah, blah, blah, blah. I've built for companies that have been really successful, like New Relic and companies like AWS have been successful, plus some from my own companies and other things I've done. And you always hope every time you do it, that this is going to grow to be the giant thing that's saves the world. And 99% of the time, it doesn't. It stays small, and everything's fine. And when you're running small scale in the cloud, any database works just fine. It doesn't matter what you use. Put up a Postgres on an ECT server and you're good to go, and you don't need anything more than that. But CockroachDB aside, if you expect to grow, then the question comes up, maybe I shouldn't use SQL Postgres from the beginning. Maybe I should instead go to something like DynamoDB, which is so much different, new learning curve, all this sort of stuff, all the restrictions of tying you to AWS cloud, or GCP cloud or big table, whatever. Maybe I should do that now. So, that when I scale, I can handle the scaling, even though it's harder for me to build it now. But really, what Cockroach allows you to do is to say, you just build forgetting about scale, really. You can almost say that. It's not good advice in general. But you can almost say that build not thinking about scale, use SQL, just like you would normally do. Because Cockroach will be there when you need to be able to scale and go up to something bigger and better. Moving from, let's say Postgres to Cockroach, is so much easier than moving from Postgres to DynamoDB. [0:37:32] JL: Yes, that's a great way of phrasing it. We've had a few nice successes from businesses that start small with Cockroach serverless, actually, and they're trying things out, they're in dev and test, they go into prod. It's a slow thing. But then they hit some virality. In the AI world right now, that's like a big cause of virality, I would say. You've got a little idea, and it's pretty cool. All of a sudden, boom, it hits Hacker News. All the people try to use your application when it hits Hacker News. And is your database going to scale with you, or is it going to get in your way? That's kind of the fit that I see for serverless, in particular, in CockroachDB in general. [0:38:08] LA: So, you even see the case where instead of starting with Postgres, start with CockroachDB serverless, which is probably similarly priced and that Postgres instance would be, and do that at small scale, and that works great and wonderful, then you don't have to do any migration at all, when you move to a larger - as you scale, you can grow from there. The only time you'd want to move on is if you needed to move to dedicated hardware or whatever, just for cost reasons or whatever. Is that a fair statement, then? Do you really see that as a use case? [0:38:38] JL: Totally. I think it's something that we have a great free starter plan. It's pretty affordable at lower scales on serverless. And from my perspective, if I'm anticipating a system - of course, this is what you said earlier, right? You are anticipating your startup to grow large, no matter who you are. So, if that's me, then yes, I would definitely want to focus on using a scalable system from day one, like, CockroachDB, for sure. [0:39:03] LA: Can you easily move to the different levels, like from a serverless platform to a server-based platform to - can easily make that migration path as you grow? [0:39:13] JL: Today, that's definitely bit more of a work in progress for us. It's something that our team is really hard at work doing right now. We just actually came from our kind of yearly planning week last week, and one of the big focuses that we've got for next year is unifying those two deployment models. I've been talking about dedicated as one product and serverless as a different product, really trying to take off the technology hat and put on the consumer hat. What you want to be consuming is the database, whether it's serverless deployed or dedicated deployed, you don't really care. You want a connection string. You want to console that shows you what's the slow statements? You want to be able to do schema changes, and you really don't want to worry about the deployment model at all. So, our focus is going to be unifying those deployment models and offering just CockroachDB in different tiers for whether this is a tier zero, high-scale production deployment, or it's something that you're just getting started with. That's kind of our next year's worth of goals. [0:40:06] LA: Cool. Good, good. So, Cockroach Cloud, you obviously have a huge infrastructure that I believe you and your group is responsible for managing a huge infrastructure for managing your database as a service is really what it is. It's a database as a service, multi-tenant database at a scale at a service system, application. What's the biggest challenge you have in keeping that infrastructure operating? [0:40:34] JL: So today, we have to manage infrastructure across all of the clouds that we support. And for me personally, as a leader in this department, the biggest challenge is just managing all of the infrastructure and understanding where is it living? Why is it spending? And why is it costing this much? This is more of like kind of cloud fin ops problem that we have. But it's something that we're struggling with, I would say, as we start to become a bigger and bigger company. We've got a deployment model that we use in Azure. We've got a different one that we use in AWS. We've got a different one in GCP. Ultimately, I mean, what I've learned is all the clouds do things a little bit differently. Sometimes that's good. Sometimes that's not so good. Sometimes they have strengths in one department and weaknesses, another but whatever it is, ultimately, the infrastructure is subtly different on each of them. One challenge for me has just been how do I actually manage all of these costs, and charge back the cost to the teams that are using them and just manage all the money, really. That's been a fun challenge for me, as of late. [0:41:34] LA: That brings up the magic word I love to talk about which is complexity, right? How do you manage this complexity and deal with the minor deviations from platform to platform that are critically important, but it's hard to keep track of, because there are so many of them? So, how do you deal with drift in your infrastructure? Obviously, in a very simplistic viewpoint, your infrastructure is a series of nodes that are identical, distributed worldwide, and that's a very simplistic view. But I imagine no drift, and you define the word node, however you want to, that might be a single computer. It might be a whole network. I don't know. That's not really the important part. The point is the configuration of that node itself. How do you avoid drift between all the different deployments of those nodes over time and over distance? [0:42:29] JL: Yes. So, I guess the way that I would define the unit here, you're talking about nodes. I think the unit that we think about when it comes to just managing our own internal cloud is, it's really at the cluster level. We use Kubernetes to deploy CockroachDB for CockroachDB cloud. We've got kind of all of these different Kubernetes clusters. We've got all this important but fiddly networking stuff that you need to set up in order to make sure that the multiple regions can talk to each other properly, or VPC peering. We even do this for our customers. So, if you want your Cockroach database to talk to your infrastructure securely, and not over the public Internet, you've got to set up private link in AWS, or private servers connected in Google. There's all sorts of kind of fiddly bits. What's tricky is that depending on what the customer wants to do, what configuration that we end up deploying for that customer is different. So, when it comes to drift, we have this problem of figuring out the desired state, per customer, per cluster, versus the actual state. And the way that we deal with that is we use a system called Pulumi, which, it's a system that helps manage some of this stuff. And we've gotten a good amount of value out of the way that we use Pulumi for this. We have a record of what's supposed to be deployed and we can compare that record of what's supposed to be deployed with what's actually deployed and ultimately come up with, yes, sort of a set of drifted configurations and manage dealing with them in a variety of different ways. I think one challenge for us is we sometimes have customers that we've got to sort of, Snowflake, so to speak. You're taking a customer who needs, maybe you're deploying something fresh for them. You're doing a private preview for them, for example, a different type of networking. You're working on a new capability. How do you actually tell the system that is managing this drift that well, it's going to be a little bit different in this case and we've got methods to do that. But ultimately, as I'm sure you know, and the listeners know, the more snowflake clusters that you get, the more snowflake situations that you get, the more pain that you're going to undergo, and you have to do things like upgrade the whole system, or change the Kubernetes version of everybody. So, this all leads to some fun challenges. I wouldn't say that we've gotten all this stuff figured out. By the way, this is just some of the challenges that we're kind of going through right now. [0:44:33] LA: Yes. I think it's a moving target is really what it is. But it is interesting that the word customer entered into the conversation. That implies to me that you're, probably, especially for your larger customers even in the cloud offering of your product, you're a single tenant? [0:44:48] JL: Tenancy in which sphere? [0:44:50] LA: So, an application that runs 100 customers on a single instance of an application is multi-tenant. If you deploy a separate instance of that application for each of your 100 customers, that's single tenancy in that context. [0:45:06] JL: We have a blend, and it depends on the cloud, and it depends - so, I mentioned kind of dedicated in serverless. Of course, dedicated we do offer for our customers, a single-tenant model where they're getting that dedicated hardware. In serverless, they're getting that shared hardware. Depending on - the different ways that we use each of the cloud's plan to this as well. And some of them, we've configured more multi-tenancy, keeping the kind of hardware single tenancy, but more of the infrastructure multi-tenancy, but it's kind of a bunch of details. And I would say, just at a high level, we do have a mixture of both right now. [0:45:37] LA: Yes. So, the answer is you have customers that are single tenant customers, that are multi-tenant. The bigger they are the more single tenant you tend to make them. That makes perfect sense. Good. Okay, great. So, that's all the questions I had for you. Is there anything else that you think that we should have talked about that we haven't? [0:45:55] JL: Good question. Probably a lot. It's just so fun to talk about database stuff. It's such a rich and interesting field. There's a million places that you could go to. We haven't covered them all just yet. [0:46:07] LA: That's a valid point. But I guess what I hope is maybe what we can do is have you back in another episode with some follow-on topics as we go along. But I really love - I just want to say, I love what CockroachDB in general is trying to accomplish and what they are accomplishing. So, I'm very excited to be watching what you're doing, and to keep up with you. I appreciate you coming on to the podcast. So, Jordan Lewis is a senior director of engineering at CockroachDB Cloud, and he's been my guest today. Jordan, thank you very much for talking to me. This has been very informative. Thank you for coming on Software Engineering Daily. [0:46:46] JL: And thank you so much for having me. It's been such a pleasure. We had a great conversation today. [END]