Cloud with Eric Brewer

RECENT UPDATES:

FindCollabs is a company I started recently

The FindCollabs Podcast is out!

FindCollabs is hiring a React developer

FindCollabs Hackathon #1 has ended! Congrats to ARhythm, Kitspace, and Rivaly for winning 1st, 2nd, and 3rd place ($4,000, $1000, and a set of SE Daily hoodies, respectively). The most valuable feedback award and the most helpful community member award both go to Vynce Montgomery, who will receive both the SE Daily Towel and the SE Daily Old School Bucket Hat

We are booking sponsorships for Q3, find more details at https://softwareengineeringdaily.com/sponsor/

Podsheets is our open source set of tools for managing podcasts and podcast businesses

New version of Software Daily, our app and ad-free subscription service

Google’s strategy for cloud computing is centered around providing open source runtime systems and a high quality developer experience.

Google is highly differentiated in its approach to the cloud. To understand Google’s strategy, it is useful to first set the context for the current competitive environment of cloud providers. <If you want to skip straight to the interview with Eric Brewer, begin at 30:33>

Amazon Web Services (AWS) popularized cloud computing in 2006. AWS was first to market, which has given the company a large competitive advantage over the rest of the industry. It took Google several years to realize how big the public cloud market was, and how good the economics of running a cloud provider could be. Microsoft also realized the opportunity several years after AWS was started.

The multi-year lead that AWS had on getting to market has given the company tremendous leverage. Because AWS is the most widely trusted, accepted default in the market, AWS is able to deepen that relationship more and more over time, despite the fact that the proprietary APIs of AWS create a level of lock-in that bears some resemblance to the lock-in of Microsoft Windows or Oracle’s relational database systems.

The brief history of software gives us examples of what is supposed to happen next.

When a large company is operating a proprietary developer platform, the open source software ecosystem reflexively comes out with an alternative, open source solution that is better than the proprietary system, right? We saw this with the Linux project, which channeled the developer resentment of the proprietary Windows operating system software into the development of the best server operating system to date: Linux.

The difference between Microsoft Windows in the 90s and AWS today is that, for the most part, developers do not resent AWS. AWS keeps its prices low, and embodies a spirit of innovation despite the fact that AWS is partly built around a repeated process of taking open source projects, packaging them up into cloud services, and integrating them closely with other AWS tools, making it harder to move away from AWS.

In its pioneering offerings of managed services, AWS made it easier to set up tools that had previously been impossible for less-experienced developers to operate. Distributed systems became approachable. Companies like Netflix, Lyft, and Heroku were built on the simplified operational model of AWS.

AWS also innovates outside of this business model of repackaging open source projects.

When AWS pioneered the function-as-a-service model, they created an easy way to scale stateless computation. They presented the developer world with an entirely new compute model: event-driven programming, with services communicating to each other via functional glue code running in cheap, transient containers.

There is strong criticism of the event-driven, AWS Lambda-bound “serverless” compute model. And those critics make a valid point: AWS Lambda is a proprietary API.

To build your app around an event-driven architecture glued together with Lambda functions is to lock yourself tightly to Amazon infrastructure. But the critics of this model do not seem to be the ones who are actually locked in. Critics of Amazon’s proprietary strategy tend to be those with a strong incentive to be critical.

Another common criticism of AWS comes from commercial open source companies such as Redis Labs and Elastic, which are vendors of the Redis open source in-memory data system, and the open source Elasticsearch search and retrieval system respectively. These vendors argue that AWS has violated the spirit of open source by repackaging open source software as profitable software and failing to contribute back to the open source ecosystem with commensurate open source contributions.

AWS has repackaged both Redis and Elasticsearch as AWS Elasticache and Amazon Elasticsearch Service respectively.

These vendors frame their relationship to AWS as zero sum, and are turning to licensing changes as their strategy of choice for creating a new defensible moat against AWS. In various Internet forums, the indignant commercial open source software vendors and AWS have both made their cases to the developer community.

If you are a developer who just wants to build your software, these arguments are an unsettling distraction. In addition to worrying about fault tolerance and durability guarantees and cost management, you now have to worry about licensing.

And as you observe these circumstances, with open source vendors accusing cloud providers of improper behavior, you might begin to wonder: what actually are the rules of open source? By what set of commandments are we judging who is good and who is bad?

And who is to judge what is fair or unfair activity by Amazon, which was the first company to prove to all of us just how influential cloud computing could be?

This is the competitive landscape of the cloud circa April 2019, when I attended Google Cloud Next, Google’s annual cloud developer conference.

Throughout the conference, the optimism and playful spirit of Google was present everywhere. There were colorful shapes and cartoonish diagrams that looked like building blocks made for children. In the expo hall, there were food pillars from which free candy and salted snacks could be grabbed all day long.

On one side of the expo hall, a demonstration from IKEA showed how the company was partnering with Google to run its furniture business more like a software company. Stretching across the expo hall in a carpeted distance of multiple football fields, the various vendors showed off their latest wares in a bazaar of multicloud, hybrid cloud, and Kubernetes-compliant sales pitches.

Over the last decade, large enterprises have opened up their wallets more and more, realizing just how valuable cloud infrastructure can be for their businesses. The more they spend, the more efficient their workers become, and the faster their businesses can improve. For modern enterprise, the cloud is the closest thing to magic that we have.

Knowledge workers are becoming unshackled from the painful, slow pace of early technological corporatism. The cloud can empower all of us–and rather than replacing our jobs with machines, the cloud will allow us to better express our humanity within our work.

The twin forces of consumer mobile computing on the client side and cloud computing on the server side are compounding faster than we can measure. Five years ago, a common analogy was to compare the smartphone to a “remote control for your life”. Today the smartphone feels more like a magic wand than a remote control. We can summon tremendous forces from the magical cloud by merely speaking the right command into our wand-like smartphone.

Someday even the world of Harry Potter may look antiquated relative to our reality. Why carry around a wand when you can use your Airpods to cast your spells via hands-free voice command?

The cloud’s magical effects on corporate enterprise operations are exciting to watch. I go to a lot of conferences, and I prefer to walk around the expo hall rather than to go to any sessions. At the expo hall, you see the distance between vendor hype and the realities of the enterprise. From my vantage point, that distance gets shorter every year: enterprises truly are adopting DevOps, continuous delivery, machine learning, and data platforms.

From insurance companies to banks to farming companies to furniture outlets: the “digital transformation” is actually happening. This is great for consumers, and great for employees at the large corporations that are being digitally transformed.

What is it like to work at a large, 100 year old enterprise that is going through digital transformation? It is certainly much more appealing than it was a decade ago.

The servile, bureaucratic, rent-seeking hierarchies of the 80s, 90s, and early 2000s are slowly being refurbished into enterprises where bottoms-up innovation, individualism, and artistic energy are assets rather than liabilities. We are moving out of the age of boring cubicle industrialism towards a vision in which work is truly creative and fulfilling for every member of the corporation.

And Google evangelizes this ideal stronger than any other company in the world.

With the tremendous cash flows from its advertising business, Google reinvests heavily into its employee base. Many employees at Google have worked there for a decade and show no intention of leaving, having found a job, but more importantly a culture and an engineering environment that is unrivaled.

Speaking subjectively: my sense is that Google’s internal engineering tools are the best in the world. For many years, Google has been the strongest magnet for talent in the realm of Internet infrastructure and machine learning. In terms of sophistication, the rest of the software industry has been playing catch-up to Google since the days of the MapReduce paper.

Speaking of the MapReduce paper: even back in 2004, Google was willing to share its findings with the outside world. Google was tactical about which innovations it would open up about, but it does seem that Google has truly tried to embody some of the publishing philosophies of academia.

Google’s early investments in an academic-like environment have borne considerable fruit. The curiosity and cross-institutional collaboration enabled by Google’s willingness to speak openly about its research have made Google a refuge for academics, including today’s guest Eric Brewer.

Beyond its considerable contributions as a publisher of computer science theory, Google has also become the model software engineering practitioner.

Google has scaled its internal tools to make its thousands of developers highly productive. Given its success with its internal developers, it should come as no surprise that Google has strong opinions about the best way to build services for external developers.

I have often wondered what it is actually like to work within Google as an engineer. I have worked at Amazon, and seen their internal tooling. At Amazon, the level of tooling gave engineers tremendous leverage. But from talking to more experienced engineers, my sense is that nobody holds a candle to the internal tools of Google.

Drishti CTO Krish Chaudhury was a recent guest on this podcast. He spent nearly a decade at Google working on computer vision projects. Today he is building his own computer vision company. When I asked him if we yet had “Google Infrastructure for Everyone”, he sighed deeply and wistfully before answering with an unequivocal “no”.

This hints at what Google wants to do with its cloud. Google does not think of Google Cloud as commodity cloud computing services. The vision for Google Cloud is to be the premier cloud, the deluxe set of services and hardware that provides “Google Infrastructure For Everyone”.

This is a moonshot, and in order to accomplish it, Google will need to forego certain short term opportunities for cash grabs. But it is undoubtedly a more fiscally wise strategy for Google to optimize growth of the 2029 Q3 cloud market rather than that of 2019 Q3.

From a conglomerate standpoint, Google already has a cash cow. Google has won today’s war, and it only makes sense to focus on tomorrow’s.

In Google’s approach to the cloud, we see a cultural distinction between Amazon and Google. The cultures of Amazon and Google are partly deliberate, but partly driven by the nature of their businesses’ respective revenue streams.

Amazon built its e-commerce business in a low-margin, highly competitive environment. Amazon won its customers over through a slow process of building trust. In its delivery of physical goods, Amazon formed close relationships with customers. Amazon would repeatedly listen to these customers and use that feedback to improve their products. This “flywheel” of iterative improvement is the core engine that keeps Amazon improving incrementally.

Amazon also has made far-flung, ambitious investments–but the size and scope of Amazon’s moonshots were constrained for many years by its lack of any singularly large cash cow. Fire Phone notwithstanding, Amazon’s moonshots have often been thrifty asymmetric bets, requiring minimal upfront investment but presenting huge potential payoff.

Google, on the other hand, has been in a much more luxurious financial position for most of its life.

Google has a high margin advertising business that accounts for ~84% of its revenues, subsidizing everything else in Google (and Alphabet). Because it has such a big cash cow in a totally different area of the business, Google can afford to take an extremely long-term approach to its vision for the cloud. For Google, the goal is not to maximize the profit margins of the cloud over the next two years. Google can afford to think of cloud profitability in the increment of decades.

In the business of cloud computing, Google has turned the weakness of being a late mover into a wide set of strengths.

The AWS console presents its users with a sprawling array of possibilities. Google Cloud has a lower surface area. Google is more opinionated about the right way to do things–and it is easier for Google to build in an opinionated fashion because there are fewer legacy customers and edge cases to support. AWS supports the majority of the market, so it is in a position where it must keep those customers happy in order to hold onto its moat.

So what are Google’s strong opinions about the way that a cloud should operate?

Google’s espoused vision is that of the “open cloud”: a cloud environment where organizations could easily move workloads from one cloud provider to another.

If we take the purest, most aspirational interpretation of “open cloud”, the full stack would be open source. Identity and access management systems would be portable as well, and cloud providers would work together to reduce the switching costs between each other, even in cases of data gravity.

As virtuous as the idea of the “open cloud” sounds, it is also strategically convenient for Google. Since it lags behind Amazon and Microsoft in terms of adoption, a gradual shift towards a widely standardized open cloud would theoretically make it easier for Google to recover market share as the cloud market matures.

Whatever Google’s true motives are, the “open cloud” strategy has been tremendously bountiful to the developer community.

By open sourcing Kubernetes and pouring resources into it, Google brought an end to the painful, wasteful container orchestration wars. In its donation of Kubernetes to the Cloud Native Computing Foundation (which it also is a heavy financial donor to), Google created an ostensibly open, positive sum environment for the rivaling cloud providers to congregate productively.

In the area of machine learning, Google open sourced TensorFlow and invested heavily into tutorials, documentation, YouTube videos, and other resources. Google built JavaScript libraries and auditability and visualization tools. Google has marshalled an entire ecosystem around TensorFlow.

Some of Google’s commercial open source efforts have had less favorable results.

The Istio service mesh project seems to have been promoted with the same playbook that Kubernetes and TensorFlow followed, but with a less usable tool. In Istio, we see Google’s expertise in marketing perhaps taken too far.

Why is Istio a problematic case study?

Because, despite the fact that there are multiple open source service mesh products on the market with significant production usage, Istio has managed to make so much noise about itself that it is convincing the market that the battle for service mesh superiority is a foregone conclusion–in spite of widespread reports that Istio is currently difficult to operate and not ready for production workloads.

From the blog posts, to the KubeCon programming to the expo hall hype, the cloud native developer community has been hammered with the same messaging about service mesh: Istio is the best, don’t bother with the rest.

Perhaps Google is just doing us all a favor with Istio. Maybe it really will be the best service mesh, it’s just not quite there yet. Maybe Google is just ironing out the kinks, and the marketing roadmap happened to proceed at a faster pace than the engineering roadmap.

Or maybe I have been talking too much to the other service mesh companies. Honestly I don’t really know how these meshes trade off against each other, though I have certainly asked previous guests for comparisons.

And it’s so early in the huge space that has been bucketed within the category of “service mesh” that maybe Google is acting with best intentions and trying to get out ahead of another container orchestration war.

Speaking as a developer who prefers to work on application abstractions far above the world of security policy and load balancing and canarying, I would personally be fine with Google running it’s open source marketing playbook with Istio, winning everyone over, and saving our energy for more productive deliberations.

I am fine with Google picking winners where it deems such actions appropriate. But this gets at a tension of the “open cloud”.

What kind of openness are we really talking about here? If I can’t see the strategic roadmap of Google Cloud or the backroom conversations, is it still open?

When the open source repos remain in Google’s control, is that open source, or marketing? When Google tinkers with an operating system like Fuscia in the open, but we are left to speculate as to why they are working on it, or what purpose it serves, is that open source or marketing?

If Google is such an open cloud, why not open source a detailed ten year strategic roadmap, and pass the buck to Amazon to do the same?

In any case, I’m not passing judgment on whether Google has done something morally wrong by pushing Istio so hard. I just think it was strategically unwise. With Istio, Google made it a little too obvious just how much narrative control it has over the public developer market.

Perhaps this narrative control should come as no surprise. Of all the major clouds, Google is the most well-versed in open source. From its contributions to Linux to its maintenance of a complicated Android ecosystem, Google knows how to play its cards in the game of open source diplomacy.

In battle, a classic strategy for competing with a rival that has an advantage is to force that rival onto territory that you are more familiar with. There are many historical cases where a small army was able to defeat a large army due to its ability to maneuver the battle front to a favorable environment.

Through its open cloud strategy, this is exactly what Google is doing. Google is open sourcing the best way that it knows how to build and run infrastructure software. This is happening slowly but steadily. To some extent, the other major providers including Amazon will have no choice but to follow Google into a battleground that was built by Google.

And as developers, we will get to reap all the rewards of this competition. Our infrastructure will become more standardized, more fault tolerant, cheaper, better designed, and easier to use. And who knows, maybe someday we will actually be able to easily move workloads from one cloud to another.

To the extent that I am a software engineering journalist, I feel inclined to scrutinize all of the cloud providers. But to the extent that I am an engineer and a business person, I feel only admiration and love for the cloud providers. Cloud computing has brought the cost of starting an Internet business down to zero.

Cloud computing has opened up my eyes to a world of creative possibilities that knows no boundaries, and for that I will always be a fan of all of the rivaling cloud companies because they all have played a role in creating the current software landscape.

Eric Brewer is a Google Fellow and VP Infrastructure. He is well-known for his work on the CAP theorem, a distributed systems concept that formalized the tradeoffs between consistency, availability, and partition tolerance in a distributed system.

At Google, Eric is as much a strategist and product creator as he is a theoretician. He has worked on database systems such as Spanner, machine learning systems such as TensorFlow, and container orchestration systems such as Kubernetes and GKE.

Eric joins the show to talk about Google’s philosophy as a cloud provider, and how his understanding of distributed systems has evolved since joining the company.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.


Sponsors

Datadog unites metrics, traces, and logs in one platform so you can get full visibility into your infrastructure and applications. Check out new features like Trace Search & Analytics for rapid insights into high-cardinality data, and Watchdog, an auto-detection engine that alerts you to performance anomalies across your applications. Datadog makes it easy for teams to monitor every layer of their stack in one place, but don’t take our word for it—start a free trial today & Datadog will send you a T-shirt! softwareengineeringdaily.com/datadog

The 2019 Velocity program in San Jose (June 10-13) will cover everything from Kubernetes and site reliability engineering to observability and performance to give you a comprehensive understanding of applications and services—and stay on top of the rapidly changing cloud landscape. Get 20% off of most passes to Velocity when you use code “SE20” during registration at velocityconf.com/sedaily

With MongoDB Atlas, you can take advantage of MongoDB’s flexible document data model as a fully automated cloud service. MongoDB Atlas handles all the costly database operations and admin tasks that you’d rather not spend time on, like security, high availability, data recovery, monitoring, and elastic scaling.Try MongoDB Atlas for free today! Visit mongdb.com/se to learn more.

GoCD is a continuous delivery tool created by ThoughtWorks. It’s great to see the continued progress on GoCD with the new Kubernetes integrations–and you can check it out for yourself at gocd.org/sedaily.

Software Weekly

Software Weekly

Subscribe to Software Weekly, a curated weekly newsletter featuring the best and newest from the software engineering community.