Kubernetes Gotchas: Lessons Learned
Running databases within container clusters seems, at first glance, to be more effort than it’s worth. Why use a system designed for stateless highly available microservices to manage persistent storage? Historically, production database systems were housed outside of container clusters, often with the help of dynamic storage. However, the data layer can also benefit from scalability and agility, and tools have been developed to make this easier. Additional benefits include fault tolerance and improved performance as smaller database parts reduce transaction time.
Kubernetes is the leading container orchestration tool, improving operational efficiency and system resilience. Kubernetes also has the benefit of being flexible and extensible through many software extensions. Sharing and persistence of the data layer has been challenging. In earlier distributions, data was temporary and could not be shared between pods. Kubernetes later introduced persistent volumes. Persistent volumes must be provisioned in advance and still require manual mounting of file systems. Storage classes overcame the second issue, by allowing nodes to request storage on the fly which is then mounted appropriately. The final piece of the puzzle is StatefulSets. StatefulSets are pod controllers which maintain persistent hostnames. Using these hostnames, persistent volumes can be mounted when pods fail and restart.
These features make it possible to run databases within pods, but the automation really comes in with Kubernetes operators. Operators take directives and implement actions, managing clusters on behalf of a user. The operators are then able to monitor and manage specific hardware, reducing human error but also improving resource management. Operators also ensure that databases survive pod failures, maintaining data integrity. (source)
With all these Kubernetes extensions, it becomes a question of which database technology is most suitable. Needless to say, distributed databases are best suited to Kubernetes. Unlike many distributed NoSQL databases, NewSQL databases support the ACID properties. If you are looking for data consistency or the familiarity of SQL and are willing to compromise on availability, CockroachDB is an excellent option. CockroachDB was designed for Kubernetes and as the name suggests, aims to provide the most resilient of database systems. As an added benefit, it brings SQL to the cloud.
CockroachDB uses key-value pairs to store everything, it translates SQL queries into key-value operations. Each key-value pair can be thought of as a SQL row. This key-value space is partitioned into contiguous ranges. Ranges are replicated and distributed amongst nodes. Should a range reach capacity, it will further split into 2 contiguous ranges. These new ranges would then be redistributed to nodes with additional capacity (using the gossip protocol). Within a particular node group housing the same range, one node, the leaseholder, receives and manages requests. In the case of a write request, a majority of nodes would need to agree to the change (using the raft algorithm). (source)
This architecture results in a few benefits. Most importantly, high availability and resilience. Replication across servers improves fault tolerance and data remains consistent through its consensus algorithm. It can operate when a minority of nodes are down, and still maintain data integrity. The raft algorithm is also used when a server restarts, a new leader is elected if the previous leader failed. In situations where nodes are down for a long period of time, the replicas on the dead nodes are rebalanced based on ranges readable from existing nodes. CockroachDB is also highly scalable. In the case of horizontal scaling, nodes simply need to be added to the cluster and CockroachDB handles the complexity. In terms of read requests, data queries can be more efficient as the use of range partitioning reduces the time taken for joins and range scans at the cost of slower insertion times (source). Lastly, geo-partitioning is unique to CockroachDB. By allowing the node location to be stored within the database, geo-partitioning becomes possible. The use of geo-partitioning reduces latency and allows systems to accommodate location-based laws and regulations.
Data distribution following node failure
It is recommended to use the Operator to deploy and manage CockroachDB deployments orchestrated with Kubernetes, but you could also use Helm or StatefulSets directly. The Operator is still in beta (expected to move into general availability in early summer) and there are a few features which are still in the pipeline, but it can configure resource requests, scale horizontally and perform rolling upgrades.
The CockroachDB Operator automates repetitive deployment tasks. It was built on top of a Helm chart and incorporates automation for tasks that are not as easy with Kubernetes. As covered by Keith McClellan, Helm is limited to applying predefined configurations and maintaining the pod according to the config file whereas the Operator can take actions based on the current state of the pod. This means that node disk space or CPU usage can be incorporated into scaling rules. The Operator adds additional automation by, for example, assisting with seamless upgrades and setting up security certificates. Until multi-region clusters are supported by the Operator, some manual work is required to set up networking in such a way that all nodes can communicate with each other.
For more information on this topic, The Cockroach Hour webinar is available on the CockroachDB website. In addition, Cockroach Labs has sponsored the free distribution of 2 chapters from Kubernetes Best Practices.
For a more detailed discussion on setting up multi-region clusters, have a look at this post.