Decentralized Objects with Martin Kleppman

The Internet was designed as a decentralized system.

Theoretically, if Alice wants to send an email to Bob, she can set up an email client on her computer and send that email to Bob’s email server on his computer. In reality, very few people run their own email servers. We all send our emails to centralized services like Gmail, and connect to those centralized services using our own client—a browser on our laptop or a mobile application on our smart phone.

Gmail is popular because nobody wants to run their own email server—it’s too much work. With Gmail, our emails our centralized, but centralization comes with convenience.

Similar centralization happened with online payments.

If Alice wants to send $5 to Bob, she needs to go through centralized banking infrastructure. Alice tells her bank to send $5 from her bank account to Bob’s bank account. This is not how it works in the physical world. if Alice wants to pay cash to Bob, she doesn’t have to go and meet him at a physical bank. She just takes out a $5 bill from her wallet and hands it to him.

The invention of Bitcoin proved that digital wallets and peer-to-peer payments are possible. But running your own wallet is like running your own email server. It is inconvenient, and so we trade decentralization for convenience once again. We use services like Coinbase, where users buy and sell cryptocurrencies in a centralized provider.

There are people in the cryptocurrency community who hate the idea of Coinbase. These people keep their cryptocurrency spread out on their own hardware wallets. Some of these people also run their own email servers.

Are these people just adding unnecessary inconvenience to their lives for no reason? No. These are smart, successful people. They don’t like to waste time. So what are they doing running their own email servers?

Distributed systems theory teaches the risk of centralized computer systems. If you have a single server that all your communication has to be routed through, your computer network will stop functioning if that server dies.

Today, civilization is reliant on centralized computer systems. This is fundamentally dangerous. The 2008 financial crisis proved how risky it is to centralize our money in the hands of a few people. The Equifax breach proved how risky it is to centralize our identity in the hands of a few people.

What happens if Dropbox runs out of money and has to shut down? What happens if all of the data centers at Amazon Web Services get wiped simultaneously? What happens if Coinbase gets hacked and every user at Coinbase loses all their money?

We have seen centralized systems collapse. The people who are running their own email servers are not crazy. Even if Gmail disappears tomorrow, they will still have access to their emails. With the example of email, we see that deploying and managing a decentralized system is possible.

Decentralization is a desirable feature of computer systems. So how do we make more of our applications decentralized?

The cypherpunks were working for decades to make decentralized money a reality. Satoshi Nakamoto invented the blockchain, and we now have a computer science construct that enables decentralized money. The blockchain also enables many other decentralized applications.

By solving a specific problem, Satoshi came up with a general solution. This is how progress often happens in computer science. In order to fix a system, we create a new tool. That tool can be applied to other systems that we don’t anticipate.

The blockchain is a tool that solves one set of problems in distributed systems. Conflict-free replicated data types are another type of tool.

Conflict-free replicated data types (or CRDTs for short) are objects that can be mutated by multiple users at the same time without creating data corruption. The most common example of a conflict-free replicated data type is the shopping cart.

Let’s say Alice and Bob share an account on an ecommerce web site. Alice is building a house, and she wants to buy some tools online. Alice has a shopping cart with a hammer in it. Bob logs into the ecommerce web site from a different computer at the same time Alice is logged in. Bob just wants to buy a tuxedo—he doesn’t know why Alice left a hammer in the shopping cart, so he clicks a button to remove all the items from the shopping cart. At the exact same moment, Alice clicks from her computer to add a drill to the shopping cart.

The server receives both requests: Bob wants to delete all items in the shopping cart, Alice wants to add a drill to the shopping cart. Both requests occurred at the exact same time, but we have to decide how to process them in some order. This is a situation known as a conflict.

Which request should execute first? Should the resulting shopping cart be empty? Should the shopping cart only have a drill in it? In either case, Alice or Bob is going to be disappointed—there is no way to avoid that. But we need some way to resolve the conflict deterministically. We do not want to have to send a message to both Alice and Bob that says “sorry, our shopping cart cannot handle your request. Please try again later.”

We need the shopping cart to be a conflict-free shopping cart—and today’s episode is about the different techniques that can be used for conflict resolution.

The shopping cart is a simple example where user collaboration leads to conflicts. Imagine all the other ways you collaborate with other users: chat systems like Slack, social networks like Facebook, document systems like Google Docs.

One way to resolve a conflict is through a technique called operational transform. Operational transform requires all the operations in the distributed system to be funneled through a centralized server. When a conflict occurs, the centralized server detects the problem and figures out how to resolve it.

Google Docs uses operational transform to resolve the frequent conflicts that occur when two users are sharing a text document. But operational transform only works if you have a centralized server.

An alternative solution is conflict-free replicated data types, which maintain each user’s replica of the data in a format that allows the client copies to resolve conflicts in a peer-to-peer fashion—without a centralized server.

Last example: Alice and Bob are now collaborating on a document that uses a CRDT data structure under the hood. Whenever they send their local changes to each other, any conflicts that occur can be resolved directly on the client. Alice and Bob can collaborate on a document just like they can send emails to each other.

With CRDTs, we can build decentralized, collaborative applications. But CRDTs are hard to use. Just like with blockchain technology, we don’t yet have the simple, elegant abstractions that let inexperienced programmers build peer-to-peer applications without the fear of conflicts.

Martin Kleppman is a distributed systems researcher and the author of Data Intensive Applications. Martin is concerned by the centralization of our computer networks, and he works on CRDT technology in order to make it easier for people to build peer-to-peer applications.

Most of the people who know how to build systems with CRDTs are distributed systems PhDs, database experts, and people working at huge internet companies. How do you make developer-friendly CRDTs? How do you allow random hackers to build peer-to-peer applications that avoid conflicts? Start by making a CRDT out of the most widely used, generalizable data structure in modern application development: the JSON object.

In today’s episode, Martin and I talk about conflict resolution, CRDTs, and decentralized applications. This is Martin’s second time on the show, and his first interview is the most popular episode to date. You can find a link to that episode in the show notes for this episode, or you can find it in the Software Engineering Daily app for iOS and for Android. In other podcast players, you can only access the most recent 100 episodes. With these apps, we are building a new way to consume content about software engineering. They are open-sourced at github.com/softwareengineeringdaily. If you are looking for an open source project to get involved with, we would love to get your help.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Sponsors


IBM Cloud gives you all the tools you need to build cloud native applications. Use IBM Cloud Container service to easily manage the deployment of your Docker containers. For serverless applications, use IBM Cloud Functions for low cost, event-driven, scalability. If you like to work with a fully managed platform as a service, IBM Cloud Foundry gives you a cloud operating system to control your distributed application. IBM Cloud is built on top of open source tools, and integrates with all the third party services that you need to build, deploy, and manage your application. To start building with IBM today, go to softwareengineeringdaily.com/IBM and sign up for a free Lite account. With the Lite account, you can start building apps for free, and try numerous cloud services with no time restrictions. Check it out at softwareengineeringdaily.com/IBM.


Today’s sponsor is Datadog, a monitoring and analytics platform for cloud-scale infrastructure and applications. Datadog integrates seamlessly with more than 200 technologies, so you can track every layer of your complex microservice architecture, all in one place. Distributed tracing and APM provide end-to-end visibility into requests wherever they go, across hosts, containers, and service boundaries. With rich dashboards, algorithmic alerts, and collaboration tools, Datadog provides your team with the tools they need to quickly troubleshoot and optimize modern applications. See for yourself – start a 14-day free trial today and Datadog will send you a free T-shirt! softwareengineeringdaily.com/datadog


G2i is a talent platform built for engineers by engineers. React Native, React, and mobile: the developers on G2i have expertise in the best tools to build your applications. When I need engineers to help me out with my apps, G2i is the first place I go–especially when I am building with React or React Native. Contract a G2i developer to help you on a short-term basis, or hire a G2i developer full-time. And if you are looking to build cross platform applications in React Native, definitely check out G2i. The G2i platform is a community of React Native, React, and mobile engineers–these developers can become part of YOUR team. If you are looking for developers to build your product, check out g2i.co. You can also send me an email, and I’ll be happy to tell you more about my experience with G2i. Find your React Native, React, and mobile talent by going to g2i.co. Thanks to g2i for helping me ship my products, and thanks for becoming a sponsor of Software Engineering Daily.


Transferwise makes it cheaper and easier to send money to other countries. It’s a simple mission, but it’s important. TransferWise is looking for engineers to join their team. If you are a Java developer, a full-stack engineer, a product manager, or a data analyst, check out transferwise.com/jobs. TransferWise is built by self-sufficient, autonomous teams, and each team picks the problems they want to solve. There’s no micro-management. No one telling you what to do. Find an autonomous, challenging, rewarding job by going to transferwise.com/jobs. TransferWise has several open roles in engineering, and has offices in London, New York, Tampa, Tallin, Cherkassy, Budapest, and Singapore, among other places. Find out more at transferwise.com/jobs.