As I’ve been saying for a while, our customers – more specifically, a segment* of our customers – face a diversity of tough challenges. What does the CIO in midtown Manhattan do when she runs out of roof space or power? How does an aid agency deliver basic connectivity to 5,000 relief workers in a tsunami stricken metropolis? What does an oil company do when they want to move high performance analytics onto an offshore platform or supertanker? Or a large web services company do when they want to cookie cutter their infrastructure next to a hyrdroelectric plant for cheap power – within weeks, not years?
None of these are easy problems to solve – especially one computer at a time. They’re more commonplace than you’d think across the globe. And now you know the motivation behind our asking a simple question, “what would the perfect datacenter look like?”
Improving upon its father, the traditional datacenter, it’d have to be more space and power efficient. Very high performance, and designed for machines, not people with plush offices. It’d have to be available within weeks, not years. And portable, to allow customers to deploy it anywhere – in a disaster area, or next to a hydro generator.
But let’s start with the most basic question. How big would it be?
In the world of vertically scaled, or symmetric multi-processing systems, pools of CPU’s share access to a common set of memory. But the size of a given system has a physical and logical limitation: it can be no bigger than the private network used to connect all the disparate internal elements.
But the future of the web is clearly moving toward horizontal or grid computing. In a grid, a conventional network is used to connect collections of smaller*, general purpose elements (like Sun’s Niagara or Galaxy systems). The question of “what’s the biggest grid?” has no obvious answer – they can be as big as you want. Just as at TACC, where they’re building the largest supercomputer on the planet out of general purpose elements.
So a while back, we asked a few talented systems engineers a simple question: is there an optimum size for a horizontally scaled system? Interestingly enough, the answer wasn’t rooted in the Solaris scheduler or a PhD thesis. It was rooted in the environmental realities faced by the customers I cite in the second paragraph. And perhaps more interestingly, in your local shipyard.
The biggest thing we could build would ultimately be the biggest thing we could transport around the world – which turned out to be a standardized shipping container. Why? Because the world’s transportation infrastructure has been optimized for doing exactly this – moving
packets containers on rails, roads and at sea. Sure, we could move things that were bigger (see image), but that wasn’t exactly a general purpose system.
So the question at hand became, “how big a computer can you build inside a shipping container?” And that’s where the systems engineering started.
First, why are servers oriented in racks and cooled by fans front to back? To maximize convenience for humans needing to interact with systems. But if you want to run a “fail in place” datacenter, human interaction is the last thing you want. So we turned the rack 90 degrees, and created a vastly more efficient airflow across multiple racks. And why not partially cool with water in addition to air – if you burn your hand, do you wave it in the air, or dunk it in a bowl of ice water? The latter, water’s a vastly more efficient chiller.
A non-trivial portion of an average datacenter’s operating expense is the power required to chill arbitrarily spaced, very hot computing platforms – vector the air, augment with a water chiller, and cooling expense plummets. As does your impact on the environment. Did I mention the eco in eco-responsible stands for economics? For many companies, power is second only to payroll in datacenter expenses. (Yes, the power bill is that big.)
And that’s how we started to go after power efficiency.
Second, if you can generate power for less than the power company charges you, why not do so – put a generator next to the chiller in a sister container, and you’ve got access to nearly limitless cheap power. (Heck, you could run it on bio-diesel.)
And if power rates or workload requirements change and you want to relocate your container – good news, the world’s transportation infrastructure is at your disposal. Trains, trucks, ships, even heavy lift helicopters. You can place them on offshore oil rigs. In disaster areas. In remote locations without infrastructure. To wherever they’re most needed.
Finally, in most datacenters I vist, I see more floor tiles than computers. Why? Because operators run out of power capacity long before they fill up their datacenters – leading them to waste a tremendous amount of very expensive real estate with racks spaced far apart. In a container, we go in the opposite direction – with plenty of power and chilling, we jam systems to a multiple of the density level and really scrimp on space. And it can run anywhere, in the basement, the parking garage, or on a rooftop. Where utilities, not people, belong.
With a ton of progress behind us, and enough customer interaction to know we’re on to something, that’s why we’ve unveiled our alpha unit, and gone public with the direction. We’ve done a lot of detail work, as well, working to integrate the container’s security systems into enterprise security systems. It knows where it is via GPS (you can locate them via Google Maps, if that’s your bent). Sensors know if the container’s been opened or moved. We’ve even done basic drop tests (one, accidentally) to deal with transportation hazards (the racks inside can handle an 8g impact!). And we’ve explored camouflage options, too (you really don’t want a big Sun logo screaming “steal me, I’m full of RAM!” on customer units).
Every customer we’ve disclosed has had a different set of concerns or challenges. None in my mind are insurmountable. But we don’t have all the answers, of course, that’s why we’ll be working with key partners and integrators (one customer wanted the container to detonate if it was breached – er… perfectly doable, just not something Sun would do).
At a top level, we know there is no one hammer for all nails.
But in this instance, there might be one blackbox for all of network computing.
Specs and details to come – and in the interim, here are some great photos and usage scenarios (I especially like the Mars Rover companion – that was Greg’s idea).
* more on this later.