What’s a service mesh and why should I care?
In one sentence, a service mesh is a pattern in which all of a system’s services are accompanied by proxies and accompanying logic management components. Common implementations of service meshes, like Istio, make use of userspace proxies, such as Envoy. At the time of writing, service meshes that use userspace proxies, as opposed to kernel space proxies, are ideal for service routing between services because they are HTTP aware. If you’re interested in learning more about service proxying, an interview with the creator of Envoy, Matt Klein, can be found here.
In other words, these kinds of service meshes can use HTTP headers to make policy decisions. Management components have one high-level purpose: to provide a configuration for all proxies in the mesh. Admittedly, Figure 1 vastly simplifies the many details behind a service mesh. The purpose of this figure is to provide an idea of how defined behavior flows through a service mesh.
These management components abstract routing logic out of the service-to-service level to common infrastructure shared amongst all services in a system. Elevating routing logic out of microservices frees developers from implementing it directly inside a given microservice. Less time spent on implementing service-to-service routing logic results in more time spent developing features and writing business logic.
At an organizational level, this separation of business and routing logic can increase developer productivity. To this end, it can allow an organization’s developers to specialize in one of two segments that service meshes establish: microservices and supporting infrastructure. One group of developers can develop microservices, while another handles service mesh infrastructure, including routing. As mentioned previously, this will most likely yield a more consistent routing pattern across the entire system.
Service Mesh Architecture
A service mesh is composed of two planes: a data and control plane. A data plane is composed of all the services in a system, as well as each service’s accompanying userspace proxy. The diagram below depicts services and proxies in respective containers; this is one of the more popular architectural implementations. The proxies in this implementation are known as sidecar proxies, as both an instance of a service and its respective proxy are within the same container.
In many implementations of service meshes, a container orchestration framework is used to manage data plane containers. Kubernetes is arguably the most popular container orchestration framework and is often used to control the containers in a service mesh’s data plane. A container orchestration framework adds other capabilities to the data plane that are necessary in order for a service mesh to be useful. For example, an orchestration framework enables data planes to perform service discovery, load balancing, and routing. It is possible to implement these features without an orchestration framework, but it will more than likely be very difficult.
It should be stressed that routing occurs in the data plane, but that the configuration of this routing is injected by management components. These management components form a service mesh’s control plane. As mentioned earlier, a control plane’s top priority is to define behavior for the data plane. This is precisely how routing logic is elevated out of the data plane. In addition to routing logic, the specific policies of all significant data plane responsibilities are provided by the control plane.
One may wonder how a control plane receives the configurations to provide to the data plane. As alluded to in Figure 1, a human provides these policies by working with some sort of interface. This interface usually takes the form of a CLI, web, or GUI tool. Depending on how well the control plane is abstracted from this interface, the difficulty of configuring a service mesh can vary.
If you’re asking, “How is this different from an API gateway?” you’re not alone. Below is a table that contrasts API gateways and services meshes.
Figure 3; Source link
What are my options?
Istio uses Envoy, a distributed proxy created by Lyft. It supports Kubernetes and is being developed to support other environments. Configuring Istio’s control plane is especially complicated. Though it is considered to be more difficult to integrate than other service meshes, Istio is one of the most featureful service meshes. Istio provides statistics, traces, and logs for cluster ingress and egress, as well as automatic load balancing for many common types of traffic. It’s worth noting that GKE offers support for installing Istio on clusters. If you’re moving to the cloud and are working with Kubernetes, it may be worth investigating.
Tetrate is a SaaS business built on Istio. The CEO of Tetrate, Varun Talwar, interviewed on Software Engineering Daily. In the interview, Varun discusses Istio, the service mesh market, and his experiences at Google, where he helped found the Istio project.
Linkerd uses a Rust proxy called linkerd-proxy. It is only a viable solution if you’re working with Kubernetes. It is one of the more simple service meshes and is a great place to start if you’re looking for a relatively quick integration process. Note that Linkerd trades ease of use for flexibility; it’s not as flexible as other service meshes. Though it is considered simpler than other service meshes, it has a solid reputation. At the time of writing, Linkerd is backed by the Cloud Native Foundation; it is the only service mesh backed by an independent foundation. The Cloud Native Foundation is the entity responsible for endorsing adoption of a number of popular open source projects, including Kubernetes.
Buoyant is the company responsible for building Linkerd. The CEO of Buoyant, William Morgan, came on Software Engineering Daily and discussed building a service mesh business.
Consul has an extension called Connect that provides service mesh functionality. It’s important to note that Envoy is not the default proxy, as is the case with Istio. Consul Connect has support for Envoy, but you’ll have to do the heavy lifting involved in swapping out the proxy. On the architectural side, Consul Connect includes both the control and data plane in its binary. This seems like a small advantage over other service meshes, in that there are no control plane services that could operate as a bottleneck. In addition, the control plane configuration is simpler than that of other service meshes, as it makes use of an interface HashiCorp calls a “service access graph”. That said, Consul lacks in that it does not provide rate limiting, tracing, or metrics collection.
Consul’s engineering lead, Paul Banks, discussed the landscape, history, and future of service meshes in this interview of Software Engineering Daily.
Kong provides a service mesh control plane called Kuma. Kuma was announced in September 2019 with an intent to address the limitations of earlier service meshes, such as their accessibility to organizations with existing infrastructure. It’s built on Envoy and works with both Kubernetes and VMs. Support for VMs makes Kong more appealing for organizations with existing workloads that haven’t been migrated to Kubernetes. Kuma can work with an organization’s legacy infrastructure, while simultaneously allowing an organization to modernize their tech stack by migrating to Kubernetes. It is designed to have a shallow learning curve but provides policies to configure lower-level details at the data plane.
Kong’s CTO, Marco Palladino, interviewed on Software Engineering Daily and discussed Kong’s architecture, as well as his vision for how their service mesh will continue to develop.
Other Service Meshes
- Gloo: “…a cloud-native API Gateway and Ingress Controller built on Envoy Proxy…”
- AppMesh: AWS’s service mesh for your AWS compute infrastructure
- AspenMesh: “…the Easy-To-Use, Enterprise-Ready Distribution of Istio…”
Again, why are any of these options relevant? It comes back to the essence of what a service mesh is: software that helps manage routing between, and gather data from, microservices.
Microservices continue to gain popularity. This popularity brings increased complexity to inter-service communication. Like Kubernetes, service meshes are here to stay because they address this increased complexity. But, unlike the container orchestration domain of technology, there is not yet an obvious choice when comes to selecting a service mesh. The problems service meshes address may not permit the rise of a ubiquitous service mesh.
Consider this extract from a Google Cloud whitepaper: “And much in the same way we brought Kubernetes into the world, we wanted to make this exciting technology [Istio] available to as many users as possible.” Google designed and backed Kubernetes. Now, it’s the most popular container orchestration solution. Google was one of several partnering companies to launch Isitio in 2016, and beta availability of Istio on GKE was announced in December, 2018. This further supports the notion of an undying service mesh market. Service meshes may not be as mainstream as microservices, yet. But, it seems they have the potential to be.
Why are service meshes gaining popularity?
- Microservices continue to become increasingly popular.
- An industry standard was set when Kubernetes won the container orchestration wars.
- Cloud providers continue leveling up and gaining popularity.
Figure 4; Google Trends interest over time graph of Microservices from March 15, 2013 to December 15, 2019
Microservices have been getting more press over the past decade. A skeptic need only reference Google Trends to be convinced of this claim (see Figure 4). Organizations began epic rewrites of enormous swaths of code. Many attempted to cross the chasm that lay between their monolith and their desired end state: a system composed entirely of microservices.
Alas, many that embarked on this trek, perished along the way. That said, many completed the journey successfully, including a number of tech giants like Netflix and Twitter. Today, microservices continue to gain popularity and are far more common than they once were. The rise of microservices created, and continues to create, new problems to solve and optimizations to make. One such problem was the management of these services.
The rise of microservices fueled the need for a container orchestration framework. An industry standard was set when Kubernetes won the container orchestration wars. This gave developers common paradigms to think about container orchestration, as well as common vocabulary to discuss it. Additionally, cloud providers continued to become more featureful and sophisticated, as well as cheaper and easier to use. More developers and organizations began moving to the cloud.
With more organizations using Kubernetes in the cloud to control microservices, the need for a higher-level tool to manage behavior of services became significantly more apparent than in years past. Service meshes address this need.
So, do you actually need to care about service meshes? This is an important question to answer, especially if you’re working in a platform role. Of course, if you’re interested in service meshes, then feel free to take the plunge. But, if you don’t want to go down the rabbit hole, here is a decision tree that will make life easier.
For what it’s worth, service meshes don’t seem like they’re a fad; they are here to stay. That said, you don’t need to use one. It’s simply another tool for solving organizational and software development-related problems. Like anything else, if you want to understand what the service mesh hype is all about, try it out yourself.
We will continue to see tools built to address problems stemming from the rise of microservices, Kubernetes, and cloud providers. The service mesh is one such tool that has been built to address these problems; it is not a silver bullet.