EPISODE 1738

[INTRODUCTION]

[00:00:00] ANNOUNCER: Online Transaction Processing or OLTP is designed for managing high volumes of short, fast, and concurrent transactions such as data entry and retrieval operations. OLTP system solved the problem of efficiently handling numerous simultaneous transactions, making them essential for sectors like banking and retail. 

Joran Greef is the Founder and CEO of TigerBeetle, which is developing an open-source financial transactions database focused on mission-critical safety and performance. He joins the podcast to talk about the TigerBeetle's technology and problems it solves.

Gregor Vand is a security-focused technologist and is the Founder and CTO of MailPass. Previously, Gregor was a CTO across cyber security, cyber insurance, and general software engineering companies. He has been based in Asia Pacific for almost a decade and can be found via his profile @van.hk.

[INTERVIEW]

[00:01:07] GV: Hi, Joran. Welcome to Software Engineering Daily.

[00:01:11] JG: Hi, Gregor. Pleasure to be here with you.

[00:01:14] GV: Yeah. Very exciting to have you here today, Joran. You are the - is it founder or co-founder of TigerBeetle? 

[00:01:20] JG: Founder and CEO, creator of the project originally. And then we spun the company out with two co-founders from the company we were at, so three founders. Yeah. But the whole team are sort of founding team.

[00:01:34] GV: Awesome. Right. Straight out the gate, I'm going to just get one thing out the way. The name, TigerBeetle, talk to me about the name. 

[00:01:42] JG: TigerBeetle, it was like a skunkworks project. We've been working on a central bank payments switch. We were consulting on it. How do you make this thing faster, safer? We were just doing analysis, like every day work on some big system. And while we were doing that work, we realized, "Wow. There's a way to make this system 100 to 1000 times faster." 

We had this idea. And then we pitched the idea. It went around a little bit. Then we sort of prototyped the idea. Five-hour prototype. And then people started taking notice. At first, people didn't believe that you could go a thousand times faster just with an idea. We prototyped it in five hours and that became like about a month of prototype work. 

And around that point, this was July 2020, four years ago, and it was like birthday month of the project last month. But then we realized, "Okay, if you want to change things, you need to put a name on it." It was an idea and then a skunkworks project. And we're still doing the consulting work that we had to do. But in the background, we've got this like, "Wow. There's this way faster way to do this." 

We needed a name that people could rally around the project and say, "It's this project." And what we were looking to capture was the idea of something that's incredibly fast, like world champion speed. Like really, really fast. But at the same time, really small, really humble, really light, really cost-efficient. Very small, small footprint. And at the same time, something really tough, like lean and mean. Something that can survive all kinds of challenging environments. Because for this switch, it's kind of like you got to be really rugged. This thing's just got to work and be rock solid. Kind of like a thorium reactor and you go home at night in the lab and you just switch off the power and that's it. It's very safe. We wanted something fast, small, and incredibly safe and tough. 

And a friend of mine, we still share in office today, Donovan Changfoot, the two of us were doing this work together. And I said to Don just over text, I said, "We're going to need a name for this project." And straight away, I was so lucky. Because, normally, coming up with a name is a pain. One of the hardest things in computer science and business or just any endeavor, getting a name. 

But within about 3 or 4 minutes, Don shot back to me, he says, "TigerBeetle." I said, "Why TigerBeetle?" He said, "Well, it's a fast insect, fast beetle." And he had Googled fast beetle and Google came back to Don and said TigerBeetle. I did the same on my side. I typed in fast Beetle straight away and Google came back with Volkswagen beetles. Those very slow, old, big - the old cars. And I just Google images was full of these cars. And to this day, I still get different results. I'm glad that I had someone else come up with the name. And that's TigerBeetle.

Yeah, fascinating creature. It really is - it's in the top two of fastest creatures in the world scaled for body size. It's like 1,100 miles per hour relatively. And it can detect bat echolocation and send a signal back to the bats. It's like technologically advanced as a creature. Fascinating metallic colors. When you zoom down to its size, it's a real predator. It's like really scary creature. We thought, "Wow. Okay. This is perfect." And it does thrive in tough environments. 

[00:05:31] GV: That's amazing. This is incredibly functional name even though it sounds just like a cool name. I love that. It wasn't just a sort of, "Oh, it's my favorite beetle." There's tons of functional reasons behind it. Yeah.

[00:05:43] JG: Yeah. It became my favorite beetle. And now I've learned there's papers written about it. It's so fascinating. And related beetles as well. Amazing what people can learn. All the biomimicry things. There's something called a rhino beetle. Apparently, the way that flies defies physics. And aeronautical engineers are trying to figure out how it does that because we could make better planes. 

[00:06:08] GV: We'll get to what is TigerBeetle that we're talking about, I guess? But I think that context is actually really fascinating. Because I think given all the sort of properties that you just sort of outlined there. And speed, obviously, will come into this in a big way. We're going to talk about what on earth is TigerBeetle. And the project and yourself came to sort of our radar. You write some really great articles around what you're doing. And one recently was talking about the history of transactions. And I love financial history generally. That straight away got my attention. But then I sort of was like, "Hang on a second. Am I reading what I'm reading correct that this project is what it is?" 

I would just sort of not exactly quote. But in that, you mentioned sort of stumbling into sort of a fundamental limitation in general-purpose database design for transaction processing. Can you kind of walk us through, I guess, the - call it Eureka moment in terms of how that then led into what TigerBeetle is? And sort of maybe just give a - if you can give it in a one-liner, what is TigerBeetle? 

[00:07:13] JG: Let's start LIFO order, last in, first out. TigerBeetle is a financial transactions database. It's really a transactions database or OLTP, online transaction processing database. We say financial trans transactions database to help people get into it. But really, it's a transactions database for transaction processing. 

The question is - as we look at what's this fundamental limit that we stumbled into in the general-purpose design of these fantastic databases, Postgres, MySQL, they rule the world for a reason. They're here to stay. They're general-purpose databases. If you want to build an app, that's Postgres. You can build anything general-purpose. 

For a long time, these general-purpose databases were used for everything. They were used for OLTP. They were also used for blob storage, and queuing and for analytics. And then in the 90s really, Edgar Codd, in '93, he coined the term OLAP. Because people realized that analytics - and, obviously, TigerBeetle is not analytics. This is just a little bit of history as we look at what is a transaction. What is a transactions database? We'll get to that. 

But just a bit of the background context. First is that these general-purpose databases, Postgres, MySQL, they're 30 years old. MySQL was 1995. Same year as Windows 95. Postgres was 1996. They're amazing databases. They've been used for everything. And they were used for analytics. 

And then in '93, Edgar Codd coined the term analytical processing. And sort of OLAP spun out. And today, you've got ClickHouse. You've got DuckDB, like flagship of analytics. You've got Snowflake, Databricks. All these. There's such a whole world now of analytics. And the anatomy of an analytical database, it's not an elephant. The Postgres mascot of the elephant. The anatomy of an analytical database is a duck. DuckDB. You switch from thinking in terms of rows to columns. And the whole anatomy, as you pop the hood of this database, look at the way the car is built, the engine, it's totally different. It's an alien creature compared to Postgres, the elephant. It's a duck. It's not an elephant. First thing. 

And then up until now, these general-purpose databases have been used also for transaction processing. And so much so that people will say Postgres is an OLTP database and it's a general-purpose database. I think in our minds, intuitively, we believe it is a general-purpose database. But if you're going to build anything, you don't reach for another database. You're like, "Why aren't you using a SQL database? You should be using a SQL database." And we know this. This is true. It's a general-purpose database. You can do anything. Kind of the burden of proof is on the developer to say, "Well, why we're not using a general-purpose database for general-purpose work?" 

And at the same time, the category has been OLTP, which is kind of we've all just grown comfortable that the two are the same. But, actually, if you look at the history of the field right in the beginning, the founder of OLTP, one of the people who helped coin the term ACID for databases, that guarantees, atomicity, consistency, isolation, durability, Jim Gray, an incredible Microsoft researcher. He also coined the five-minute rule. He worked on System R, where SQL came out of. It was his work that led to the formation of the Transaction Processing Performance Council, TPC. They gave the database industry the benchmarks for OLTP. 

And so, it was actually Jim Gray who defined in one of these very first benchmarks what a transaction is. This was 40 years ago exactly. This summer, 40 years ago. In July. This is the burning question, "Now what is a transaction?" What did Jim Gray define it as? And in my mind, I've always thought that a transactions database is the same as a SQL transaction database. A general-purpose database. Because you can do anything with a SQL transaction. 

And so, I always just conflated the two, transaction, SQL transaction. And now I just want to come to the first principles part of the story. Step back from the history. And we'll come back to what did Jim Gray think. But my journey as TigerBeetle working on the Central Bank switch, what we saw with the switch was it was so simple. It was really Alice pays Bob. That canonical example. Money is moving from one row in the database to another. 

The problem was there were only eight rows. You've literally got eight rows because it's a Central Bank. In some countries, there are eight banks around the table. And there's eight rows. This is a simplified example. But the high level, this is really what it is. It is so simple. It's eight rows in a table. And each row has got an integer amount. And that could be financial. But it could also just be inventory. 

This could be you're a shipping company and you've got eight rows of your hottest products. I mean, you only have your Apple and you have eight products. They can all fit. Steve Jobs has got that famous quote that all the products of Apple could fit on the boardroom table. And so, you're selling eight products. There's eight rows. Again, simplistic example. 

But, generally, this Pareto principle. Often, a lot of what a business sells, there's eight kinds of it. Or say you're Google. You've got millions of customers. But you only have a handful of bank accounts. Again, it's eight rows. Or, obviously, they've got many more. It's this interesting thing where there's millions of something on the one hand. And on the other hand, everything is flowing through these eight rows. 

For example, for a country, millions of citizens, but eight banks. Eight rows around the table. Ultimately, all these transactions are flowing through eight rows and you're moving numbers. And these numbers could be books in the world's biggest online bookshop. Or it could be any kind of thing. It's moving from one personal place to another. 

The specific example of the Central Bank it's money. Doesn't have to be. But it was eight rows. And they were trying to move lots of updates between these rows. And they were using a general-purpose database. You pop the hood of this car that they were building. And inside, the engine is a general-purpose engine. And you look at how it's working. And every time Alice is paying Bob, it's really that simple. There's tens of thousands of lines of code around the SQL database just to do that. But, really, it is as embarrassingly simple as Alice pays Bob. This bank, this bank. And you're just moving pieces of paper. I owe you. I owe you. And that's all. 

But there's a lot of code around a general-purpose engine. But then you look at it in more detail and you see, "Wow. Okay. Alice pays Bob." And there's 20 SQL queries. And, really, some people get this down to one. You can do it one for one. But there is a lot of business logic. Let me sketch it out for us. 

Alice pays Bob. And what we saw was SQL transaction from your API server. Stateless API layer. Or your gateway. And you got a SQL transaction from there to your database. Postgres database. And SQL transaction. And straight away, you select two rows. Select Alice's row. Select Bob's row for update. Rowlocks. Both rows are rowlocked. And this goes back across the network. Half a millisecond roundtrip time. And then there's maybe one more SQL query.

But, generally, the rule of thumb for many systems is 10 SQL queries to do all the business logic. Because you have to also ask the question to the database, "Was this transaction processed before?" We're still going to answer this question of what is a transaction. But within a SQL transaction, already you see we're asking the database was this transaction processed before. And, really, what we're meaning is did Alice already pay Bob? This transaction between Alice and Bob, did this already happen? Because, obviously, money should only move once. Not twice or thrice. 

And so, SQL transaction. Rowlocks for Alice and Bob. And there's business logic. Did it happen before? And there's other things. Generally, there's about 10 queries. Or you could get it down to one. But one or two probably. But the problem is you've got rowlocks across the network. And that means that it's going to take about 5 milliseconds to commit the SQL transaction in the end. 

And then you realize, 1 second, there are only a thousand milliseconds in 1 second. If it takes 5 milliseconds, that means we can do 200 of these transactions between Alice and Bob. And, immediately, someone's going to say, "Sure. But there are many Alices and many Bobs. It's not that simple. We've got concurrency." 

But, actually, when you look at these workloads, it's not possible because there are only eight rows. As soon as you do a transaction between Alice, chances are very high that Alice is also doing other transactions at the same time with other banks. They're all transacting simultaneously. They're all rowlocked all the time. And so, literally, you've got this fundamental limit that, "Wow. Your system can only do 200 transactions a second. Maybe a thousand." 

Generally, with Postgres, for a lot of systems, the limit works out to be about 1,000 TPS. Maybe you can do 5,000 and then you starting to cut corners. And the reason again is that it's rowlocks. The general-purpose SQL transaction design interacting with network round trip time, which is a function of speed of light, which doesn't change. It's never going to get better. And the third thing is the number of SQL queries and contention in your workload, which usually for transaction processing is usually high. 80% of transactions involve the same 20% of rows. 

For example, you're a warehouse or you're an electronic store, you sell electronics. You're selling Apple. You're selling Samsung. Find many different rows. But as you sell Apple, you also want to touch the row that says your total Apple stock sold today. And so, there are only two rows, Apple or Samsung. Maybe three or four. You see this everywhere when you start to look at it. 

Same problem as Ticketmaster. This is the classic example. They sell to millions of people. Most of those people just want Taylor Swift. There's one Taylor Swift row. And that row just gets hammered rowlocked. And these systems just go down. And the fun thing about this is you can throw hardware. Give it NVME. Go horizontal. Whatever you do. It doesn't change anything, because the fundamental limit is there. 

SQL transactions, roundtrip time never changes. There's always locks. Whether you go horizontal or not horizontal just makes it even worse. Because now you've got distributed locks. This is the problem we spotted. And the long story - we're really going into the weeds here. We're going to come back to Jim Gray. What is a transaction? 

But we saw for the purpose of the switch, a transaction is Alice paying Bob, which is implemented - the implementation is a SQL transaction. But there's an impedance mismatch because of this fundamental limit. And for years, this limit's always been there. It's been latent. For years, it's never been a problem. But we saw with the switch already that the world is becoming more and more transactional. Instant payments, volumes have exploded a thousand X. 

People are moving from cash to digital. And it's like the real world. We went onto the internet and scale exploded. And the same is happening for payment transactions. But the same is happening around the world in in-game transactions, energy transactions, cloud transactions and APIs. If you're a software engineer, you're probably selling an API and people are making API calls. And that's what you sell, your usage-based pricing. 

In the past, we would sell things once a month. And you would sell a game to somebody. Now you don't do that. You give them the game. And then, every day, they're buying little things from you. You're far more transactional. More transactions with less value. And so, this is what we saw also for the switch. And we realized, "Okay, this fundamental limit is now be becoming a big problem." The world is changing. Internet scale coming to all other sectors. And as sectors move onto the internet. 

And so, we understood a transaction really is a debit-credit. That is what really everybody needs. Because for centuries, this has been the unit of a transaction. Debit Alice, credit Bob an amount, a time. We call this the who, what, when, where, why, and how much of transaction processing, as businesses transact, as startups transact. Whatever you're selling. Essentially, the most elegant schema you could come up with is the one that the world has been using for centuries, debit Alice, credit Bob. 

It could be debit this warehouse for this inventory. Credit this warehouse as your trucks are moving some quantity. But it doesn't have to be financial. It's just counting as you're moving counts across. This was our first principles understanding. The fundamental problem. It's latent. It's now becoming existential. Systems are falling over. And the way to solve this is a transaction is debit-credit. Why not just put 8,000 of them in a 1-meg message? 

The debit-credit information is very - for centuries, it's standardized. It's a standardized schema. Credit Bob, debit Alice an amount. Who? What? When? Where? Why? How much? It's about 128 bytes of information. Two CPU cache lines. Let's put 8,000 in a 1-meg message. Let's send that across the network to the database and back. Single round trip. No rowlocks. And now your whole API layer is just hitting that database 8,000 at a time. 

And, suddenly, you're doing a million transactions a second and it's so easy. And you're amortizing the cost of networking. Cost of fsync to disk is amortized. Cost of consensus. Everything. There's so much mechanical sympathy here with this first principal's design. If you see a transaction as a debit-credit and if you pack 8,000 of them in a - you redesign the database from first principles and see this car needs a transactions database. That can speak debit-credit as a first-class primitive. Could get rid of all that complex, crazy expensive tens of thousands of lines occurred in that payment switch. Hang on. Everybody needs one of these too. Every startup is transacting doing business. 

This is how we started TigerBeetle. We built the whole thing from first principles. The whole database from the ground up. Storage engine, consensus, everything. Love to go into those details with you. But this was the story. For 2 years, we started the company. Then it's an open-source project for four years. Then Coil, where I was working, they invested 3.2 million equally with amplified partners. Other 3.2 million. My CEO invested in us. It was amazing experience. And we have a founder-led startup two years ago. And this was all from first principles. Just thinking, believing this is really - it's necessary. It's from a real problem. 

And then last year, I was reading the history of transaction processing and still thinking in my mind OLTP is Postgres and SQL transaction. And conflating that. And then Andy Pavo of Carnegie Mellon, the famous databaseologist. He even has a lab coat saying databaseology. Andy Pavlo, professor, databases, Carnegie Mellon, I noticed something very strange. He referred to TigerBeetle as a transactions database. Not a general-purpose database. That just got me thinking. I thought, "Wow. That's weird." He just called us a transaction processing database. Not a general-purpose database. 

I always thought, "Well, we were just not a general-purpose database." We were some specialized first principles thing. But then he called us transactions database. And I thought, "Wow. Okay. That's Jim Gray." I was reading the history of the transaction processing field, OLTP, and then something happened, Gregor. I saw, wow, the very first benchmark that Jim Gray did to measure transaction processing performance power, it was the paper. Measure of transaction processing performance power. 

The name of that benchmark was debit-credit. How many debit credit transactions can your database do? And I think I fell off my chair and I thought, "Okay. This workload, we really solved this with TigerBeetle. 

[00:25:37] GV: It's a perfect loop. Yeah. Where the first principal effectively that you had been working from was the same. That is how he defined a transaction. I mean, my context of this, I took accountancy 101 way back in the day. And to understand that it was still - I think this is still helpful for the listener base. Because I can imagine a lot of people have not studied accountancy. I've also run a business and been in the weeds on the accounting. I understand it from that perspective as well. 

For every single credit, there has to be a debit. For every debit, there has to be a credit. There is no such thing as a - you can't pay something out of one account and it not have a corresponding other side to it. And this is a very strange concept I think for people to get their head around on day one. You say, "Well, I just pay myself a salary. Or I send some money out for this." Yes. But which account is being debited or credited either way around. Right? It's just so interesting that, as you have an amazing history, the history that you've recalled of how that fundamental first principle there is debit and credit. Yeah.

[00:26:43] JG: It's like the first law of thermodynamics. Money cannot be created nor destroyed. It can't just come out of thin air and go to the creative accountant's pocket. And this was what the world sort of evolved, double entry. For every debit, there's a credit, like you say. And because it's like the first law of thermodynamics, you can actually - that applies to the universe. You can apply this to the universe of business. And that's why it works so well. You can describe any kind of - this is what struck me with Jim Gray, is he thought of as a transaction is selling a theater ticket. And you only want it sold once. But that is a debit and a credit. You debit your inventory of theater tickets. That specific seat. And you credit an individual who bought it. Or an airline reservation, or a Ticketmaster for Taylor Swift, it's all debit-credit. It's a paradigm shift. But, suddenly, you realize, "Wow. I just hit upon the perfect schema." Yeah. 

[00:27:40] GV: Amazing. You've touched on a couple of things already that I was going to sort of ask about. I mean, you mentioned sort of in your materials, being able to process 8,000 transactions in a single round trip without rowlocks. And you've given some really great examples of now of how rowlock even occurs. I think Taylor Swift example, that was a very good one. I sit only a few hundred meters away from a large stadium in Singapore and she was performing there. I know full well how many people must have had to have hit that Ticketmaster row. It's insane. 

Could you maybe just briefly even just go into like what does that look like when you're - you mentioned, if I understand correctly, every round trip, it's always 8,000 transactions. Either if they're sort of empty or full, or empty or have data. Or how does that work? 

[00:28:26] JG: It starts off small. If your app is not doing a lot of API calls - say, you're counting all your API calls for a customer, because it's usage-based pricing. You're counting the number of calls. What types of calls they make? Because that is what you're charging for. You're a startup. You're selling your API. And you're counting the calls, the types of calls. And you're also doing the billing in real-time so that you can say, "Okay, at this exact moment, they've done $5 worth." Okay, now they've hit their free tier limit." Now you've got the concept of real-time spend caps. And you know in real-time what your API usage is but also what your API costs are. And, also, what your sales price is for your customers. 

And so, let's use this example. If your API isn't busy, you're starting off and you're doing one API call a second. Then that is going to be a database query to TigerBeetle from your app. It's a nice stateless app layer. And TigerBeetle's system of record with all the state, that's the hard part. But your API isn't busy. It's going to be literally a 256-byte header, plus 128-bytes debit-credit for that one API call a second. Literally, this is the first principles thing of TigerBeetle is it's so efficient. 256-byte header on the network, plus 128-bytes. That is a request per second that you're doing.  

Then as your app gets busy and suddenly Taylor Swift comes along and your ticket master decided to use your API into the hood as well. Now, what'll happen automatically, and TigerBeetle will do this for you, so your app doesn't have to worry about batching. You just send your individual calls to TigerBeetle to record like how much your API is used and what the value of that is. 

But you do that one-on-one. The TigerBeetle client is - it's in your programming language like Node.js or Java or Go. You can create accounts. Create these debit credit transfers between them, very simple. You just do one at a time. But that database driver or a client in your programming language is going to start batching it up. So as you're doing now a million a second, you've got a whole API layer around TigerBeetle. Those individual clients are now batching these up so that network request goes on the order of like around 256 plus 128 bytes. Now, it's going to fill up to a megabyte. But TigerBeetle does that transparently. 

Counterintuitively, you actually get better latency that way. Normally, people think batching and latency are opposed. It's actually more of a U-shaped curve. If you only do one network request per real-world transaction, which would be the case with General Purpose SQL if you're really good, or it's usually even worse, so 10 SQL queries for one real-world thing. But let's say you do one for one. Then you get worse latency because now you're paying all these fixed costs; the networking, the fsync on the database. That's got pretty much a fixed cost, so actually batching - you get this U-shaped curve and latency. It's the same, but you're getting more throughput because now you're filling those fixed-size things with more things. 

[00:31:59] GV: Right. Got it. There are so many concepts and features of TigerBeetle we could cover, and we're unfortunately never going to be able to cover all of them today on this episode. So I'm going to have to be a little bit choosy, I think, on where we go. I can only say I just encourage, obviously, anyone who's been listening so far and is interested go to the TigerBeetle website. There's tons of blog posts, and there's lots of information that will fill in anything we are not able to cover today. There's so much there, which is amazing. 

We're going to talk about one more, and this is sort of, again, slightly higher level concept and just sort of how you approached it. Then there's a couple of more specific things that are sort of TigerBeetle-only that I'd love to get into. Just on the higher level safety in this context, and you actually mentioned of implementing NASA's Power of 10 rules and why that level of safety is crucial. Could you just speak a bit to all of that?

[00:32:55] JG: Yes. Thanks, Gregor. If anyone wants to dive in, the code is on GitHub, tigerbeetle/tigerbeetle on GitHub, and matklad on the team, he was the creator of rust-analyzer. He's live on Twitch every Thursday doing code walkthroughs, all into the weeds of we've done a few. But he does everything, all the design decisions in TigerBeetle, all the engineering there. We walk through the code live on Twitch. There were 2,000 people live on stream. We had a Twitch raid last week, so it was fun. But it's really a great way to learn about just general concepts, how to do consensus and how does a database work in general. We go. There's 32 episodes. Yes. It's all in Zig as well, so it's very easy to read, very nice to read the code. 

Yes, to your question, I've actually just - yes. Can you remind me? 

[00:33:49] GV: Of course, yes. 

[00:33:50] JG: I got side-tracked there. 

[00:33:51] GV: Of course, of course. Safety. 

[00:33:52] JG: Ah, safety. Yes, safety. Yes. 

[00:33:54] GV: And NASA's 10 principles. Yes. 

[00:33:55] JG: Okay, safety. Yes. Let's not get safety. It doesn't matter if you're fast and buggy. Safety was the most important actually. The performance, what we've covered from first principles that makes a difference in people's lives, I got goosebumps when I realized we could do 100,000x because that makes a big, big difference. That's so just - but safety actually is the thing that is most exciting about TigerBeetle because we realized existing systems were 30 years old literally, and they're fantastic. They're tried and tested. 

The flip side of that is actually they're tried and tested in the research, and there's so much great research, for example, from Wisconsin-Madison. Remzi Arpaci-Dusseau, Andrea Arpaci-Dusseau, they've been doing this for years, just showing how these databases actually fail. They tried and tested. We know how they fail in the literature that what happens if the disk gets corrupt. Actually, most of these databases were not designed for that. SQLite in the doc specifically says it expects to read from the disk what it wrote. If you have a bug in XFS like happened last year where XFS will write to the wrong place on disk, that will break most databases today. They can't handle that. 

They have check sums. Usually, they're not on. But if you switch them on, they still won't protect you from this because the check sums are only meant in the design to help the database recover from power loss. In the database literature, they call this crash consistency model. Be consistent through a crash. But that model is not strong enough to survive faulty disk that writes to the wrong place or reads from the wrong place or corrupts data at rest. Many things go wrong with these systems, and users have found this in the wild. There was Fsyncgate where Postgres took a routine storage fault. It didn't have to lose data. But Postgres, MySQL, all of them, they accelerated this into data loss, so something that shouldn't have led to data loss and Craig Ringer on the Postgres mailing list. 

I think a lot of users just think this is noise. I'll just recover from backups. Actually, the database is accelerating these things and causing data loss when it shouldn't. I think a lot of people just don't know, and they just recover from backups. Actually, there are these fundamental design things that could be so much better today because we've had 30 years of research into how to build safer systems. Hardware has changed drastically. We've got NVMe, the way you - the band. We call it the four primary colors; network, storage, memory, and compute. They have two textures; bandwidth, latency. 

At TigerBeetle, we say we're painting. We're sketching from first principles. We're painting with four primary colors; network, storage, memory, compute. Bandwidth, latency for all four. But those four bandwidths in the last 30 years have totally changed the way - today, network and storage bandwidth is plentiful. In the past, it wasn't, so you would spend CPU cycles and memory bandwidth. You would burn memory bandwidths trying to reschedule how you work with disk. Today on Linux, you just say to your disk scheduler, "I don't want scheduling because the disk is so fast." Everything has changed but with safety, too. 

I had worked on systems where we use NASA's Power of 10 rules, and I realized like this is how you build safer system. For example, TigerBeetle at startup, we allocate everything we need for memory. All memory is allocated up front. In runtime, there's no malloc or free. I had been in Node.js for 10 years as well and seen GC pauses and what can go wrong. I ran so far away from that. Normally, you run into the arms of manual memory management, and you're going, "Okay, we're going to do malloc and free. Or we're going to use Rust and systems language." We ran even past that and we said, "Well, we don't even want malloc or free." There should be no hidden allocations anywhere in the standard lib. We want this database to be rock solid, have explicit limits of resources. This way, you've got a well-defined piece of software, and you can throw the Taylor Swift wave at it, and it's going to handle that wave. Whatever it can't handle will just wash over, and the system is still running. It's not going to out of memory on you. 

This was also why we picked Zig that we can be so explicit with memory, but this was all from the NASA Power of 10 rules. You want static memory allocation. You also want assertions. What's interesting with TigerBeetle is we code it forwards and backwards. Normally, if you're just trying to code something, you're just trying to make it work. Make it work. Make it fast. Make it pretty. I think Steve Jobs said that as well. Make it work. Make it fast. Make it pretty. As software engineers, we just want to get it working. We're coding something positively to get it work, and that's it. Then we productionize it, polish it, and it's all positive, and the thing works. 

I had a bit of a security background trying to - I was like a white hat ethical hacker breaking into systems but also how to defend systems. What I realized is that hackers think differently. They know that programmers are coding. Let's just make it work. Let's just make it work. The hackers are thinking the negative space around this. How should it not work? What would be surprising? There's that Monty Python sketch. Nobody expects the - and then they arrive on the scene. That's how hackers think. They come at you with all the things you didn't expect. 

TigerBeetle is two pieces of code. We code what should work. Then we've got tons of code assertions on what should never happen, so what do we not expect. For example, normally, if you're coding a for loop, you just code a for loop. TigerBeetle, there's negative code around this as well. We will assert how many times did we expect the for loop to run at a minimum. Then we will code and assert how many times do we expect the for loop to run at a maximum. Or it's just a loop in general. This is the way hackers think because if you don't code this negative space, hackers will exploit you there. These are - we call them semantic gaps. It's the gaps between worlds, the negative gaps that hackers can step through like in The Matrix. They chain these doors together to come up with an exploit. But the flip side of security is safety. 

[00:40:49] GV: Yes. That was a really interesting concept what you just described there. The negative space almost sounds like what would happen if you could effectively run unit tests every single time a function ran. 

[00:41:02] JG: Yes, perfect summary. Thank you so much. This way of thinking is like I'm not going to write unit tests outside of my code. I'm just going to make my whole code. As you come into a function, check all the arguments. As you leave, check them. It's like unit tests in your code. That's so perfect, Gregor. Thanks. 

[00:41:20] GV: Yes. Well, I just hadn't ever heard of this concept before. I was like, "Wow, this just sounds -" It almost seems so obvious when you hear it. Why don't we do this? I guess, basically, as you just mentioned, make it work, make it fast, make it pretty. Ultimately, most engineers for various reasons have to kind of move on after the make it works part. 

[00:41:42] JG: Yes, yes, yes. Actually, these are TigerBeetle's technical values. Make it work, safety. Make it fast, performance. Make it pretty, user experience. All three, they are. I think the other thing is maybe developers don't want to put assertions in their code because they're worried that the system crashes. Actually, you want this, I think, because you want your database to either run correctly. Or if it detects that it's being hacked, shut down safely. Again, I'm just using a hacker as a metaphor for a bug because I think the flip side of security is safety. So the same techniques work, and you can make rock-solid software. It either runs correctly or shuts down safely. 

When software is autonomous and it runs very fast and something goes slightly wrong, it's like resonance. Then suddenly this whole supersonic vehicle explodes. A spaceship explodes. You want these things to be so predictable and safe at high speed or to shut down safely. Again, thorium reactor, passive cooling. You don't want your nuclear reactor to be active and explode. But then, now, to your point, unit tests but in your code, then what you do is you go and fuzz these systems. You just autonomously test them. I'm not AI. It's a real thing. There's also a lot of hype. But the exciting thing for me is not autonomous coding. It's autonomous testing. 

We've actually had this for years. Let's just write little computer programs that will autonomously send Taylor Swift load. Then you fuzz these systems like anything. Then the slightest thing goes wrong. Your unit tests are now going to be activated in your code, and they're shut down. Straight away, you know. Okay, there's the bug. Your debugging velocity is so much faster, and all this is deterministic. You're not trying to - I've worked on systems distributed where you know there's a bug, and it takes you two years to find and fix it. But this technique, we call it TigerStyle!. There's a whole doc you can read about this methodology, TigerStyle!. There's a talk on our YouTube as well. 

This is - it's NASA's Power of 10 static allocation limits on everything, assertions everywhere, plus fuzzing, plus what we call deterministic simulation testing. You can even abstract time and speed up time. We do this by a factor of 700x. Now, you're testing like, literally, we run 100 simulator calls 24/7 for TigerBeetle's simulating. We abstract and speed up time 700 times, so 100 calls for a day times 700 is two centuries of testing, whereas, otherwise, you're running a database in real-time. So you only get 30 years of testing across a few million users. These techniques are so much safer. This is kind of why I say that this is probably the most exciting part about TigerBeetle is the testing and the TigerStyle! stuff. Yes. 

[00:44:48] GV: Yes. I was going to definitely move on to this and partly because not only is this -you actually call it TigerStyle!, and deterministic testing in itself it's already a pretty huge feat to overcome. But I love how you guys just took it one step further, which is to put a whole game around it. You have SimTigerBeetle which I've played. Please, this is amazing. Just describe SimTigerBeetle and what was the story behind that. Yes.

[00:45:17] JG: Oh, I'm so glad to hear that. I'm so glad you've played it, and you said that you've played it because everybody is like, "Well, where's the game?" We're like, "Well, it's a walking Sim." But we did put a game within a game. At the end, if you get to radioactive, there's actually a little - a platform. You can jump up and get your highest score. But the story behind that was I realized we've got a problem. We've made this database, and it's all invisible. My favorite things are all invisible. Stories and music, you can't see them. Software, you can't see it. 

We've talked about time travel like Interstellar. Let's run our database on a planet where time moves faster, and we test it. Age it quicker. That's Interstellar, but the other movie of TigerBeetle is like Inception, which is the simulation does - so you run real code in a simulator. It doesn't even know it's in a simulation, but it is. That's how you do storage fault injection and all those things. But then taking inception a bit further, we took TigerBeetle. It's running in the simulator. That's how we normally test it. The problem is it's all at the terminal, so nobody sees this. It's invisible. 

We were going to go to MozFest in Amsterdam. It didn't happen because of COVID, unfortunately. But we were going to be there at MozFest in Amsterdam, and we were going to be in like a TigerBeetle house. I knew there would be these Mozillans, and they would bring their kids. Then I knew the poor kids, they're going to come and see a terminal. They're not going to connect with the world of databases, and they're not going to become programmers. They're not going to become computer scientists. They'll become scientists. My daughter, she says to me, "Papa, I'm going to become a scientist." I said to her, "Yes, computer science is also - it's part -" No pressure, obviously. 

But the wonderful thing was I thought, well, how do we show adults and kids Pixar style like this is a database? How can they see these things, see the fault injection, see the cosmic rays hitting the disk, and the TigerBeetle survives? What we did was like Inception. We went into a deeper dream, and we took the database running in a simulator. Then the simulator, thanks to Zig, compiled to WASM running in a browser tab. Then we put a graphical skin on top. 

There's a port of the NanoVG game engine to Zig that Fabio Arnold did. He's also on the team. He and Joey, our illustrator, they put a - it was a part-time project for a long time, and they did so much with so little in a short time. Then you've got this actual game over TigerBeetle kind of inspired by our childhoods, also like Worms and Scorched Earth because these were all games that the level is generated from a deterministic seed. We had the same thing in our simulator, so we thought what if we give people three different cartridges they can play? The network storage is perfect. See how TigerBeetle runs. See some clients. There's a few movie references like Agent Smith from The Matrix wears the glasses in the first level. There's a Captain America in there. Yes, this was the game. 

[00:48:22] GV: Was this developed in-house, or did you kind of go to some specialty, more game developers for this? This is, obviously, complete sidebar. I'm just curious how - 

[00:48:30] JG: Yes. At the time, I was working for Coil. This was the company in San Francisco, and we were working on the switch. Then TigerBeetle became a open source project full-time at Coil. It's this amazing company. Then we realized, okay, we're going to go to MozFest. We need a demo. I was reaching out to game developers, but I couldn't find any. But in the Zig community then I found Fabio Arnold. He was doing this part-time for us, not in-house. It was like a skunkworks project with the illustrator, Joey, also part-time. They're now full-time on the team. This was three years ago. It was just credit to Coil as a company and the Zig community. 

You meet these amazing surfers that can spot this quality swell from far away, and they're paddling out to the swell. They don't care how many surfers are on the wave. They just see, wow, Zig is quality, and they're there. We were just lucky because I went to Milan to one of the first Zig meetups in Europe. I met Fabio, and he actually demoed a game he made with Zig. I thought, okay, this is perfect. It's pretty cool what they did, how you see the screen shake and the lightning flash. 

[00:49:42] GV: Oh, it's incredible. My first question was just how on Earth did they had this thing all kind of - what was the inception of this, and how did it all come together? I think that's just a great little side quest story, if you like, for TigerBeetle. Just moving it back to the fundamental system itself, so as you said, safety is one of the most exciting parts. I think it has been very interesting to hear exactly how that has all unfolded. 

Coming back to maybe one more slightly more in the weeds part in terms of how the system actually adds concepts back to this is fundamentally a debit-credit-driven database. You have, I guess, what you've called financial primitives such as T accounts and two-phase transfers. How have you actually implemented - oh, let's talk about what are those, and then what were the challenges in actually implementing them at the system level. 

[00:50:38] JG: Okay. Thanks. Yes. This is the thing is TigerBeetle is not just a record of debit-credits. It gives you actually a rich set of primitives designed around debit-credit. Excuse the analogy, but it's like Redis for debit-credit. You get all these debit-credit primitives that you can play with and bold things with to power your startups' business transactions, whether it's API calls that you're tracking or whatever. 

Let me give an example now. We've got something called a two-phase debit-credit. One of the biggest importers in the US reached out to us because they really like this primitive because they've got a massive shipping fleet around the world with real-time inventory on ships that they track with satellites. They've got a few hundred trucks across the US and all these depot branches. They're moving physical materials from one branch to another with the truck. They've got a count of physical hardware material moving from one branch to another with a truck debit-credit. The challenge is that the truck is in flight, and they don't know when is the truck getting to the branch. Did it get to the branch, or did it just get - did it disappear? Or did things disappear while on the truck in transit? You've got this question of time, and so really it's a two-phase debit-credit. 

They're saying, "Look, these quantity of goods are no longer in this branch. They're in the truck. This truck is in flight. We expect it to be there within a day or two. Or, otherwise, there's a timeout, and it rolls back. Or we have to go and investigate." You've got this situation where some quantity of goods, it's not in warehouse A. It's not in warehouse B either. It's on the truck, so it's inflight. But you know that it's supposed to be at warehouse A. If something goes wrong, it should go back there. This is something you can just do in a single round trip with TigerBeetle, and you can do 8,000 of these two-phase debit-credits. 

Another example is a credit card transaction. As you tap, it's authorized. Then a little bit later, it gets confirmed, captured. That is a two-phase debit-credit. You authorize, you're reserving the funds, and they're about to move. Then in the second phase, that gets committed and posted. You can use this very nicely where you're a startup. You're building something internally, but you have to now go to stripe and get them to do a transaction for you, and you're waiting on that. You can reserve your inventory or your ticket master seat. You reserve that. Somebody has now chosen their seat. You reserve that with a two-phase debit-credit, so no one else can book it. Now, you're waiting for Stripe to do the payment transaction. Maybe you'll set an hour timeout. If Stripe fails, you will release that seat back, so people can book it again. 

TigerBeetle makes that so easy. It's just one round trip to the database. Trying to do that with SQL would be - this was actually where the 10,000 lines have occurred in that Central Bank switch because they were trying to do this. We realized, wow, this would be very cool if you had it at a first-class primitive. It's kind of exciting how people build with this. There's a gaming startup, and their whole virtual inventory is in TigerBeetle. You could use these kinds of things. 

[00:54:09] GV: Yes. That is - some really fun and interesting examples. I guess as we're wrapping up here, as you've now alluded, people willing to kind of get going with TigerBeetle, what is the best way? If you're a developer and you - obviously, I think most people will be pretty excited by what they've heard today, so where to go and how to get going. Yes. 

[00:54:28] JG: Yes. GitHub tigerbeetle/tigerbeetle, open source is there. From there, you've got our docs and Slack. We've got a whole community Slack. Around about 1,000 developers are there, all comparing notes like very cool recipes, how to build things on top of these primitives, help with operating it. We help as well as a team, so there's that, too. But, yes, GitHub and Slack and, again, yes, Twitch, IronBeetle, the show by matklad every Thursday. That's more if you want to really understand how TigerBeetle is built. If you want to understand how to build with TigerBeetle, that'll be our Slack. 

[00:55:06] GV: Very cool. We're recording here in - it just turned to August. It's August now, just for the listeners that are listening. When we were talking, you've just announced Series A funding I believe, so congratulations. As well as maybe too much hype has put on congratulating the funding and not the business. I don't think there's any trouble with us congratulating the business in this case. But what will that funding enable you to do, I guess, and how does that change the perceived in your mind trajectory for TigerBeetle?

[00:55:39] JG: Yes. No. Thank you, Gregor, for the kind words and agreed with you. The Series A is really like we've just been entrusted with more. When we spun the company out from Coil, it was very much like our CEO was sending us out, a team of gardeners. Plant a seed and tend to it and look after it and make it grow as a business and serve the community honorably at a profit. Then, really, nothing has changed. 

The Series A, when we raised it, one of the engineers said to me because we talk about catching rain with buckets. Let's put the buckets out. They catch the rain that's falling. Let's not let it be wasted. He said to me, "Wow, we can buy a lot of buckets with this." Really, it's the same thing. More water to water the tree. Tend to the garden. Make it grow. You're entrusted with something. That's all it is. The level of trust has increased. We'll be using it to serve the community honorably and profit building the business to support TigerBeetle, the cloud platform. You can just push a button, and you've got a TigerBeetle running. 

We're already using it now to work with customers. Singapore, there's two; London as well and around the world. We send our engineers to work with people. But we can start to do this because we've got the means. It's really building out the business to serve TigerBeetle, give people a better experience, and develop the database, of course. Yes. 

[00:56:57] GV: Of course. Amazing. 

[00:56:59] JG: Just to add, the very cool thing was our lead investor for the A, Natalie Vais, I've known her before TigerBeetle was a company, and she's a database engineer. She was at Oracle. Then she was at Google for Cloud Firestore, and so very technical person. We actually met through Jamie Brandon's HYTRADBOI niche database conference. So pretty good alignment, all believing in this future of transaction processing. 

[00:57:29] GV: Yes. I mean, I think it's a really great call out. Obviously, not all businesses need or want funding. But if that is the route that you have decided you'd like to take, then finding alignment is the holy grail almost that the investor shares just as deep a level of the vision and the product as you do. It sounds like you found that which is more than deserved I think is the best way to describe it, given the amount of time and dedication clearly you put into this. Yes. 

[00:57:56] JG: No. Well, thanks. I think it just comes back to that little beetle that's done all the work. We kind of just discovered it, Don and I on that payment switch. 

[00:58:05] GV: Amazing. Joran, thank you so much for coming on today. I think I don't like to have favorites, but this has probably been a favorite conversation so far of 2024. I really appreciate you coming on and hope we get to meet in the future perhaps. I think many SE Daily listeners realize, obviously, most of our episodes, there's two people sitting in two very different countries. Yes. Maybe we'll meet at a niche database conference or something along those lines. 

[00:58:30] JG: I can't wait, Gregor. It'd been awesome. Thanks so much. It's been a favorite of mine, too. 

[END]