Tuesday, April 21, 2009

McKinsey and Cloud Computing

McKinsey has created a tempest-in-a-teapot by denouncing the economics behind both in-the-cloud-cloud such as Amazon E2C and behind-the-firewall clouds for large enterprises.  At a high level I think their analysis is actually pretty good, but the conclusions misleading due to a semantic twist.  They use Amazon E2C as a model, and their conclusions go something like this:

  1. Amazon E2C virtual CPU cycles are more expensive than real, in-house CPU cycles
  2. You waste 90% of those in-house CPU cycles
  3. You'll waste almost as many of those virtual cloud CPU cycles, only they cost more, so they are a bad deal
  4. You stand a decent shot at saving some of those real CPU cycles through virtualization, so you should aggressively virtualize your datacenter
  5. You're too inept to deliver a flexible cloud behind-the-firewall, so don't even try

I'll let you ponder which of the above statements is misleading while I address some related topics.

The goals of cloud computing are as old as computing itself.  They are:

  1. Reduce the time it takes to deploy a new application
  2. Reduce the marginal cost of deploying a new application over "standard" methods
  3. Reduce the marginal increase to recurring costs caused by deploying a new application over "standard" methods

Back in the days of yore, when programmers were real men, the solution to this was time sharing.  Computers were expensive and therefore should be run at as a high of utilization as possible.  While making people stand in line a wait to run their batch jobs was a pleasing ego trip for the data center operators, the machines still wasted CPU time while performing slow I/O operations and waiting in line generally made users unhappy.  Thus time sharing was born, and in a quite real sense the first cloud computing environments, because in many cases a large institution would purchase and host the infrastructure and then lease it out of smaller institutions or individuals.

The problem here is that the marginal cost equations end up looking like a stair-step function.  If you had a new application, and your enterprise / institution had excess mainframe capacity, then the marginal cost of letting you run your application was near zero.  But if there was no spare capacity - meaning the mainframe was being efficiently utilized - then the marginal cost was high because either someone else had to be booted off or you needed an additional mainframe.

Now fast-forward a couple decades to the PC revolution.  Somewhere along the way the cost curves for computers and people crossed, so it became appropriate to let the computer sit idle waiting for input from a user rather than having a user sit idle while waiting for a computer.  Now you could have lots of computers with lots of applications running on each one (although initially it was one application at at time, but still, the computer could run any number of them).  This smoothed out the non-recurring marginal cost curve, but as PCs proliferated it drove up recurring costs through sheer volume.

Unfortunately this had problems.  Many applications didn't work well without centralized backends, and some users still needed more compute power than could be reasonably mustered on the desktop.  So the new PCs were connected to mainframes, minicomputers, and eventually servers.  Thus client-server computing was born, along with increasingly confusing IT economics.  PCs were cheap, and constantly becoming cheaper, but backend hardware remained expensive.  The marginal non-recurring cost becomes completely dependent on the nature of the application, and recurring costs simply begin to climb with no end in sight.

Now fast forward a little more.  Microsoft releases a "server" operating system that runs on suped up PCs an convinces a whole bunch of bean counters that they can solve their remaining marginal non-recurring cost problems with Wintel servers that don't cost much more than PCs.  Now more expensive servers.  No more having to divide the cost of a single piece of hardware across several project.  Now if you want to add an application you can just add an inexpensive new Wintel server.  By this time the recurring cost equation had already become a jumbled mess, and the number of servers was still dwarfed by the PC on every desk, so there no tying back the ever increasing recurring costs.  This problem was then further exacerbated by Linux giving the Unix holdouts access to the same cheap hardware.

Thus began the era of one or more physical servers per application, which is where we are today, with McKinsey's suggestion for addressing: virtualization behind the firewall.  The problem with this suggestion is that, for a large enterprise, it isn't really that different from the cloud-in-the-cloud solution that they denounce as uneconomical.  One way is outsourcing a virtualized infrastructure to Amazon or similar, and the other is outsourcing it to their existing IT provider (ok, not all large enterprises outsource their IT, but a whole lot do).

Virtualization, in the cloud or otherwise, isn't the solution because it doesn't address the root cause of the problem - proliferation of (virtual) servers and the various pieces of infrastructure software that run on them, such as web servers and databases.  Hardware is cheap.  Software is often expensive.  System administrators are always expensive.  Virtualization attacks the most minor portion of the equation.

Virtualization is the right concept applied to the wrong level of the application stack.  Applications need to be protected from one another, but if they are built in anything resembling a reasonable way (that's a big caveat, because many aren't) then they don't need the full protections of running in a separate OS instance.  There's even a long standing commercially viable market for such a thing: shared web hosting.

It may not be very enterprisey, but shared web site/application hosting can easily be had for about $5 per month.  The cost quickly goes up as you add capabilities, but still - companies are making money by charging arbitrary people $5 per month to let them run arbitrary code on servers shared by countless other customers running arbitrary code.  How many enterprise IT organizations can offer a similar service at even an order-of-magnitude greater cost?

Not many, if any.  Yet do we see suggestions pointing out that Apache, IIS, Oracle, SQL Server, and countless other pieces of infrastructure can relatively easily be configured to let several applications share compute resources and expensive software licenses?  Nope.  They suggest you take your current mess, and virtualize it behind the firewall instead of virtualizing it outside the firewall.

Sphere: Related Content