What Makes a Database a Good Fit to Run in Kubernetes?
As more organizations shift operations to the cloud, the benefits of cloud-native software and infrastructure continue to accrue. One critical task in managing a cloud-based microservices architecture is container orchestration. Kubernetes is the container orchestrator of choice for 78% of IT Leaders identified in Red Hat’s “State of Open Source 2020” report. It’s a powerful and flexible tool, but certain issues remain that can make managing storage difficult. To meet this challenge, NewSQL and, more recently, Distributed SQL databases like CockroachDB have sought to reimagine how databases work from the ground up. NewSQL aims to merge the best features of both RDBMS systems and NoSQL systems to create a truly “cloud-native” database. Distributed SQL databases take that a step further for usability, by seamlessly integrating with microservices like Kubernetes while still speaking the language of data: SQL.
By definition, databases are considered stateful applications, while Kubernetes was intended to facilitate the management of stateless applications that are connected to external services to store state if necessary. The guarantees which Kubernetes offered- for example, high availability of services- were premised on the idea that the underlying pods were effectively ephemeral and interchangeable. In the early days of Kubernetes, it was an open question whether it would be convenient, or even possible, to run a database in a cluster.
Kubernetes manages containers dynamically, meaning that individual containers may be brought up or taken down at any time to match the needs of the application as a whole. This works fine for stateless services, but a stateful, persistent service like a database cannot afford to be restarted arbitrarily. In the past, developers often chose to run databases outside Kubernetes, which was problematic because it required redundant infrastructure management tools for the external database, such as monitoring and service discovery. Several methods have been added to Kubernetes, which can help manage storage, such as Persistent Volumes and Storage Class objects. Since the 1.5 release, Kubernetes has offered StatefulSets (and the related, but more complex DaemonSets), ensuring that pods retain the same unique ID even if they are rescheduled to another machine. Connecting these pods to a remote “Persistent Volume” via their unique ID allows the application to maintain state, even when pods are being created and destroyed. Since version 1.14, Kubernetes has supported “Local Persistent Volumes” as well, allowing developers to use the same PersistentVolume APIs to connect to either Remote or Local storage.
The addition of StatefulSets and the ability to connect pods to persistent storage should definitively answer whether it was possible to run a database alongside a cluster. But simply adapting technology designed with bare-metal in mind to a cloud environment is a suboptimal solution, and inevitably some of the benefits of the cloud will be compromised. Legacy RDBMS systems still lack certain features that would make them ideal for running in Kubernetes. On the other side of the aisle, the trade-offs made by many NoSQL databases to make them more cloud-friendly mean they have lost a degree of usability power that developers may depend on in cloud-native applications (in particular, many have their own querying language). In order to understand why some NoSQL databases made this trade-off, it’s important to align on a single definition of “cloud-native.” Here’s the official one from the CNCF (Cloud Native Computing Foundation)
“Cloud-native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach. These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.”
Or, more simply, a cloud-native application is “architected to take full advantage of cloud platforms.” A cloud-native database should make the most of these advantages without sacrificing the features which made RDBMS systems dominant for decades. The only databases that do that well are Distributed SQL databases, which bring together the benefits of NoSQL and the features that make RDBMS systems powerful. Now that we know we can run a database in Kubernetes, what criteria can we use to choose which database to run? As exemplified below, most databases make you choose a trade-off between ease of use and horizontal scalability.
Nate Stewart, the Chief Product Officer at Cockroach Labs, describes “DASH” factors that help align databases with the goals and function of Kubernetes. Quoting at length from his article on DASH:
- Disposability ensures your stateful systems can survive when ephemeral cloud resources cease to exist.
- API Symmetry allows distributed databases to always provide the up-to-date answer, no matter which process is handling the client request.
- Shared-Nothing Properties enable your database to make forward progress without any centralized master or coordinator.
- Horizontal scalability allows the relational database to take advantage of the unlimited and on-demand resources available in the cloud.
The first factor, Disposability, is a central consideration for databases deploying to a Kubernetes cluster for reasons already discussed above. One of the significant value propositions for Kubernetes is the ability to handle outages and failures dynamically, and it is critical that a database running on Kubernetes be robust to unexpected issues.
Traditional RDBMS/SQL systems were designed with a single machine in mind. Databases might “failover” to a backup device, but such processes came with a risk of data loss. A significant advantage of NoSQL databases was that they were designed to be highly available by contrast, and most NewSQL databases derive their availability strategies from the NoSQL world. As a Distributed SQL database, CockroachDB makes the best of both these worlds.
CockroachDB’s name was inspired by the popular image of the cockroach as the ultimate “survivor.” Disposability and high availability are integral to its design. Its Multi-Active Availability model is based on the Raft consensus algorithm, which requires three nodes to agree on a write’s contents to establish consistency (more on that later). As long as a majority of nodes agree, the data can be written- meaning that if any nodes are down, the request can still proceed as long as a quorum can be established. This makes the system robust to service outages while maintaining consistent data between nodes.
The second factor of DASH, API Symmetry, refers to the guarantee of consistent responses regardless of which node receives a request. There is an obvious parallel with the ACID concept of ‘Consistency.’ Consistency requires all database nodes in a cluster to have consistent data. Traditional RDBMS systems maintained Consistency when partitioned by ensuring that a write would be propagated to all replicas of the database before a response was returned to the client, which can cause transaction time to scale up with the number of partitions. To avoid this, NoSQL systems adopted an “eventually consistent” model where a response could be returned to the user before the propagation of data was complete, relaxing the Consistency guarantee. This allowed for scaling with less work “behind the scenes,” but breaks API Symmetry since there will be some lag time where the master and replicas are out of sync.
Distributed SQL systems such as CockroachDB aim to achieve Consistency by increasing the flexibility of database nodes in responding to client requests. If a CockroachDB node receives a request, it will either respond (if it has the necessary data) or forward the request to another node behind the scenes, and the next node will handle communication with the client.
A read request in CockroachDB from the client forwarded from Node 2 to Node 3 to ensure Consistency.
The third factor, Shared-Nothing Properties, dovetails nicely with the high-availability and disposability factors mentioned above. Survivability was built into CockroachDB from the start. CockroachDB nodes do not have a “master” process vulnerable to failure- any node can take any request and communicate with other nodes to solve a problem if necessary. Nodes can even work on processes in parallel by breaking up complex queries. The lack of a centralized coordinator means that there is no centralized “failure point” which the other nodes depend on.
The final DASH factor, Horizontal Scalability, is where the contrasts are most evident between SQL, NoSQL, and NewSQL databases. As mentioned above, the ACID guarantees required by traditional SQL systems required a significant amount of “overhead.”
This does not necessarily mean RDBMS were slower than NoSQL databases (in the case of reads, assuming queries were adequately optimized), but ensuring the ACID guarantees for write transactions were kept when scaling presented difficulties. Traditionally, RDBMS would scale “vertically,” which typically meant investing in bigger and faster server hardware.
When hardware limitations made vertical scaling infeasible, “horizontal” scaling could be accomplished by “sharding” the data across several machines, but this tended to push complexity up to the developers who now had to manage several database locations.
Elastic cloud infrastructure made horizontal scaling much more straightforward. One of the principal advantages to NoSQL databases, at least in the early days of cloud computing, was that they could scale this way. However, doing so required NoSQL systems to adopt ‘BASE’ principles instead, which relaxed the assumptions of ACID transactions.
As with other NewSQL databases, CockroachDB was designed with scale in mind. But like NoSQL systems, it scales horizontally by adding new machines, but with the closest adherence to ACID principles.
DASH properties define a set of best practices for designing a cloud-native database, particularly one running in a dynamic environment like a Kubernetes cluster. Similarly essential is the ability of a database to serve the needs of developers. In the original whitepaper for Google Spanner, the inspiration for CockroachDB (and a project which CockroachDB’s founders worked on), the authors wrote about building data-driven applications:
“…developers of many OLTP applications found it difficult to build these applications without a strong schema system, cross-row transactions, consistent replication, and a powerful query language.”
Despite the drawbacks in the construction of many traditional RDBMS, the SQL functions such as JOIN and WHERE clauses, in addition to the guarantees offered by ACID transactions, had enough value to keep traditional SQL systems competitive with NoSQL. CockroachDB is wire compatible with PostgreSQL, which was chosen both for its compatibility with a wide range of languages and for its better Object Relational Mapping (ORM) implementation versus other RDBMS. The ability to use complex, structured queries eliminates one of the significant drawbacks of NoSQL systems. Also, the Isolation guarantees provided in ACID-compliant transactions can be crucial for heavily regulated workflows such as financial ledgers. In our interview with Peter Mattis, a co-founder of CockroachDB, he told us about the original value proposition his team sought to create:
“…There are essentially people who are looking at NoSQL systems and being like, ‘Ah! I got this horizontal scalability, but it doesn’t have the functionality applications most applications want. They don’t have transactions. They don’t have indexing. All the niceties you would get from a traditional SQL database.’ The other end of the spectrum, we had the SQL databases, and the SQL databases are like, ‘Oh! We’ll give you all this functionality, but sorry, you’re going to have to do sharding eventually when you get to a certain scale.’ Why wasn’t there some marriage in the middle?”
Given that CockroachDB and Kubernetes share “ideological DNA” and a common origin at Google, it may perhaps be no surprise that CockroachDB was built with Kubernetes in mind. Kubernetes is complex, and the additional work of provisioning and managing external databases simply to make an application stateful can be an onerous burden. CockroachDB’s design makes it clear that Kubernetes workflows were taken into account from the beginning. Distributed SQL databases such as CockroachDB unite the flexibility of NoSQL with the expressive power of SQL systems, and their cloud-native features make them an ideal fit for developers building applications on Kubernetes.
For more information on running CockroachDB with Kubernetes, check out the “Kubernetes Bootcamp” series available at Cockroach Labs’ website. The core product is open source, and you can try a free managed cluster of CockroachCloud (with all enterprise features) here.
To hear more about CockroachDB, check out our interviews with Peter Mattis and Ben Darnell of CockroachDB and CockroachLabs. For lots of great content about databases and distributed storage, check out our database archives at SoftwareEngineeringDaily.com.