A year ago, Andreessen Horowitz (a16z) published one of my favorite articles on cloud. The Cost of Cloud: A Trillion Dollar Paradox tells a tale all too familiar for those of us who have lived the cloud revolution: Public cloud starts cheap, becomes merely “cost-effective,” then is tolerated because of its utility. And then one day you realize that cloud hosting is consuming massive amounts of your budget.
A16z has absolutely nailed their diagnosis: cloud cost can spiral out of control. But their primary proposal seems to be that the solution is to return from the cloud to the data centers of yore. We think that there is a better way, and we’re eagerly building part of the solution.
Andreessen Horowitz’s Diagnosis
The central thesis of the “Cost of Cloud” article is that cloud spend for established companies is high – far too high. And this is a result of falling for the early siren’s song of cloud, and then growing inextricably intertwined with the services offered. After a certain point, cloud is both too expensive to use, and too enmeshed in the organization to abandon. Near the top of the article, the author hits the nail on the head:
[I]t’s becoming evident that while cloud clearly delivers on its promise early on in a company’s journey, the pressure it puts on margins can start to outweigh the benefits, as a company scales and growth slows. Because this shift happens later in a company’s life, it is difficult to reverse as it’s a result of years of development focused on new features, and not infrastructure optimization.
And when they put numbers to these observations, the results are stunning:
[A]cross 50 of the top public software companies currently utilizing cloud infrastructure, an estimated $100B of market value is being lost among them due to cloud impact on margins — relative to running the infrastructure themselves.
And to put the icing on the cake, they drive the point home in one pithy assessment:
If you’re operating at scale, the cost of cloud can at least double your infrastructure bill.
The article goes on to detail the case of Dropbox, which saved $75M in two years by what a16z refers to as “repatriating.” Then the article turns to an analysis of what the impact of cloud spend is on the industry as a whole, concluding that it is indeed a trillion dollar problem.
Repatriating the Servers
Throughout the article, a16z talks about “repatriating” servers. At the end, as they suggest remedies for the trillion dollar cloud spend problem, they advise both the plan for repatriation from the outset and also to incrementally repatriate (if it is not possible to simply repatriate all at once).
What is this “repatriation” thing? As a16z tells it, Dropbox returned to the pre-cloud model of hosting their own racked servers in data centers. They found it was cheaper to operate their own equipment than to continue hosting in the cloud. In short, they migrated back from the cloud. They brought the servers home. They repatriated.
Lest anyone accuse me of an ungenerous approach to the text, a16z does clearly state that the goal of the article is not to present a concerted case for repatriation. It is just one option among a few (they also suggest KPIs attached to cloud spend and ruthless optimization). But over the last several months, I have had several conversations that go like this:
Me: Have you read the a16z article about cloud spend?
Other: Yes! We’re thinking seriously about repatriating.
Had this happened only once or twice, I would not be concerned. But it has happened often enough to get me really wondering why repatriation seems like a good idea. To me, the very idea of repatriation feels like a return to the 1990s, a dangerous move in a time of staff shortages and hardware supply chain issues.
It seems clear, though, that repatriation is rooted entirely in one simple assertion: Operating our own hardware is cheaper than running in the virtualized cloud.
But the devil may be in the details, for it’s not about the hardware, but the software decisions we make. Cloud spend is high, in many cases, because we are addicted to expensive practices for running our software.
Is it possible that if we gave up on these practices, if we altered our ways, we could tame cloud cost?
Today’s Expensive Practices are Rooted in Overconsumption
Let’s focus for a bit on cloud compute. Here, we’re talking about virtual machines, containers, and that layer of higher level services that provide a thin veneer on those technologies. And we can start by making an unpopular observation about everyone’s favorite cloud technology.
Kubernetes is ridiculously expensive to operate.
Why Kubernetes is expensive
Rewind to 2015, and the story for Kubernetes went something like this: Because a container orchestrator can more efficiently pack containers onto a single node (virtual or physical machine), you can save money by squeezing every list bit of compute performance out of your nodes.
In other words, Kubernetes looked like it was going to save us from the “trillion-dollar paradox” to which a16z points. After all, VMs were prima facie inefficient, requiring the gigs and gigs of a full operating system to run just one puny service. Kubernetes, built on the veiled mythos of Google’s Borg, was the cost savior.
How did we get from there to a scenario in which there are multiple products offering cost savings for running Kubernetes? In our experience (we are the creators of Helm, Brigade, SMI, Draft, and many other Kubernetes technologies), even if you start with a modest three node cluster of large virtual machines, you’ll exhaust the cluster resources once you have a service mesh installed. And that’s before you start your first microservice!
Kubernetes does not, it turns out, render microservices cheap to operate. Especially not when each main container is accompanied by various “sidecars” and where multiple supporting services require their own heavy-weight containers running on (at least) each node. Kubernetes does these supporting services no favors when it requires each operator (the K8S design pattern, not the person) and controller to abide by Kubernetes’ stateless paradigm, yet requires each such service to implement that behavior on its own. State reconciliation is a burdensome process requiring lots of listeners to attach to a fire hose of data picking apart each event to see which ones are materially important.
Kubernetes is expensive. And it’s expensive because of its design. Kubernetes is a platform rooted in overconsumption of compute.
Why containers are expensive
But containers themselves are also costlier than most people realize. To understand why, we need only consider the core design principle of a container. It is designed to run one main process, almost always a server. The container starts, bringing up a single-purpose server, and that server runs until the container is destroyed.
A server container works because it is always running. When traffic is directed to the container runtime (Docker, Moby, Containerd or whatever), the runtime directs the traffic to the server running inside the container, which then handles the request. Yet even when there is no traffic, the container runtime, the container itself, and the server all continue to run. They consume memory, CPU, network ports, and other system resources.
This places a limit on how many containers can be run on a host. And when you add up the high memory/CPU consumption of the runtime, the overhead of the container itself, and then the resources consumed by the server, it is clear why that limit is so low.
Again, with Kubernetes and service meshes, running one server may require running several containers (the main server plus all of the necessary sidecars). And each sidecar container is also consuming resources, whether idle or active. And for the most part (though not always), the simple pragmatics of operations require that three or more instances of any given server be running in the same cluster.
Say we have one microservice with three sidecars, and we want to run the recommended three instances. That’s twelve containers alway running. This is not particularly efficient.
It is therefore easy to see how cloud cost can balloon. Adding a single service to Kubernetes can more than triple the resource cost of that self-same service.
Reduce the Load by Changing the Compute Profile
Repatriation suggests that the proper solution is to take all of that stuff out of the cloud. Run the cluster locally on your own hardware. That will be cheaper. Andreessen Horowitz cite both Dropbox and Datadog as examples.
Perhaps the cost is lower.
My first reaction to the a16z post was to assume that this calculus surely skipped over capital expenses (e.g. buying servers) or operational expenses associated with hiring a larger operations team, but according to the article:
Thomas Dullien, former Google engineer and co-founder of cloud computing optimization company Optimyze, estimates that repatriating $100M of annual public cloud spend can translate to roughly less than half that amount in all-in annual total cost of ownership (TCO) — from server racks, real estate, and cooling to network and engineering costs.
That covers both capital and operational expenses, at least in the obvious cases.
We could argue that there are other costs. Hidden cost of re-architecting software, the opportunity cost, or even the environmental cost. But it’s probably not worth the digression. We all understand what it would mean to shoulder the operational burden of an on-prem solution. While larger organizations might see this as a worthwhile investment, most of us will not. We are being asked to give up on the APIs, the ready-at-hand tools, the workflows, and the integrations… in exchange for swapping hard drives in and out of bays and managing our own on-prem “enterprise edition” (read: manage-it-yourself) versions of cloud services we once loved.
A second solution, then, is to lighten the load on the cloud side. In the article, a16z suggests ruthlessly optimizing or incentivizing cost reduction. We can get more specific. Our question is can we cut the cost of compute?
The next wave of cloud compute
At Fermyon, we believe a major part of the solution is to change the way we write microservices. In so doing, we can jettison overconsumption.
- The virtual machine model has a hypervisor that then runs multiple long-running copies of an entire operating system.
- The container model has a container runtime that runs multiple long-running containers with their servers.
- In contrast, the Spin framework that Fermyon is currently building takes a different approach: The Spin environment executes short-running instances of services.
Each of the three platforms (VMs, containers, and Spin) require running a host environment. That part of the equation doesn’t change. The change comes in what those host environments do.
While VMs and containers run workloads for days and months at a time, Spin runs its workloads for milliseconds. In part, it does this by moving the network listener out of the guest instance and into the host. That is, Spin itself listens for inbound traffic and then starts a component instance just in time to handle the inbound request.
Because Spin works this way, it also removes the need to run multiple copies of the same service. Simply running Spin environments on multiple hosts in a cluster is sufficient to handle scaling from zero up to thousands based strictly on demand. That is, when an app has no traffic, it is not running at all. When it’s under load, it executes enough instances to respond to the load. Then the millisecond the load is gone, the app is back down to zero instances.
Does this new approach save money?
Does this translate to savings? Our early evidence suggests it does. The Fermyon.com website uses Nomad as a scheduler (not Kubernetes). We run three Nomad workers on only small VM instances (AWS t2.small
to be exact). And even in our highest traffic moments, we don’t see the cluster break a sweat.
Of course, the exact configuration of a cluster will be determined largely by the type, task, and number of services that are run. But it appears that the density Spin can achieve is one to two orders of magnitude higher than a containerized version running in Kubernetes. We have eliminated the constant overhead of idle services. We have reduced the overhead of the runtime. And because services are only executed on demand, we’ve changed the calculus for packing workloads onto a worker. This is what we call the next wave of cloud compute.
Our conclusion is that perhaps optimizing for compute performance is a better cost-cutting measure, and a better operational measure, than repatriating. And rethinking microservices is the beginning of the journey toward the optimized cloud.
Importantly, not all compute services can be rewritten in this pattern. Stateful services (where important bits of information must remain in memory over time) are not a good fit. In Containers vs. WebAssembly we pointed out cases where Docker containers are a better fit. So we must honestly concede that while the lion’s share of stateless microservices are a fit for Spin, other services are not.
Rethinking Microservices is Not Enough to Solve the Trillion Dollar Paradox
I have focused on compute services in this article. But I would be flat-out wrong to claim that compute alone is behind a16z’s trillion dollar paradox. Here in this conclusion, I will point to other unsolved pieces of the puzzle.
- Bandwidth charges are still high, and reducing compute will do little to ameliorate this.
- Storage cost has historically been cheap, but large repositories of data will add up. And again, nothing in Spin addresses this.
- Additional hosted services (databases, event hubs, and so on) each come with their own markup.
Controlling cost on those axes is critical. And solving these problems will entail rethinking how we build databases, CI infrastructure, and storage services.
We’ll all get there. It will take time, but we’ll get there. Cloud as it is today might not be the future, but neither is repatriation and a return to the operational practices of the 90s.