The world needs only five computers


C|Net hosts an interesting interview with Greg Papadopoulos, CTO of Sun, where he articulates the Sun argument that there will only be a small number of companies in the world that can achieve the scale and efficiency of data center operations to compete and that everybody else will get squeezed out. I think this basic premise is correct – certainly running a huge datacenter is hard and expensive. Google has done better than anyone else at making the complements of good software – CPU time, disk space, bandwidth and programmer hours – relatively cheap and easier to deploy than any of their competitors. That superior execution is throwing off dividends.

Other software companies must look at that scale and infrastructure cost and be struck by a combination of jealousy and fear. Certainly Microsoft is and it’s started serious investment around datacenter scale.

While Google, Microsoft, Salesforce.com, Sun, and Amazon are all attempting to build the common platform that everybody writes their apps against, Amazon’s EC2 effort has some uniquely interesting attributes not mentioned in the article: While Salesforce.com and Sun both make you write code to their proprietary API (and Google and Microsoft won’t run your code at all) Amazon sells access to their grid the “CPU Hour” where the “API” is just a bootable linux disk image. As a potential consumer of these services, this is immensely attractive to me. Much as Open Source software gives me the at-least-conceptual threat that I might take my existing software to another vendor, Amazon’s design gives me confidence that if they ever get too pricey or just go away entirely, I can always go throw my own boxes into a datacenter and run the machines myself. Who wants to make the same bet that the Sun API or Salesforce API will still be around in 5 years or they won’t jack up the prices? What’s in it for Amazon? Amazon gets to achieve scale on the backs of other people’s companies, driving down prices for their own needs.
Of course a raw Linux disk image is a pretty primitive construct to start building massively parallel and reliable systems. Code needs to be written to decide when to spool up new CPUs, for splitting and joining work like Google’s MapReduce and so on. This seems like some complementary software that Amazon should write to spur adoption of EC2.