Note: The below is a straight off-the-top-of-my head rant I dashed off to my editor at a technology journal I occasionally write for. I'm looking for feedback to tighten it up. Feel free to tear it apart!
When it comes to data center metrics the one most often talked about is square footage. Nobody ever announces that they’ve built a facility with Y-tons of cooling, or Z-Megawatts. The first metric quoted is X-square feet. Talk to any data center manager however and they’ll tell you that floor space is completely irrelevant these days. It only matters to the real estate people. All that matters to the rest of us is power and cooling – Watts per square foot. How much space you have available is nowhere near as important as what you can actually do with it.
If you look at your datacenter with a fresh eye, where is the waste really happening?
Since liquid-cooled servers are at the far right-hand side of the bell curve, achieving electrical density for the majority of us is usually a matter of effectively moving air. So what is REALLY preventing the air from moving in your data center? I won’t rehash the raised floor vs. solid floor debate (since we all know that solid floors are better) but even I know that the perforated tiles, or the overhead duct work is not the REAL constraint. A lot of folks have focused a lot of energy on containment; hot aisle containment systems, cold aisle containment systems, and even in-row supplemental cooling systems.
In reality however, all of these solutions are addressing the environment around the servers, not the servers themselves which are after all, the source of all the heat. Why attack symptoms? Let’s go after the problem directly: The server.
First of all, the whole concept of a “rack unit” needs to be discarded. I’ve ranted before on the absurdity of 1U servers, and how they actually decrease datacenter density when deployed as they are currently built. I’d like to take this a step further and just get rid of the whole idea of a server case. Wrapping a computer in a steel and plastic box, a constrained space, a bottleneck for efficient airflow is a patently absurd thing. It was a good idea in the day of 66 Mhz CPUs and hard drives that were bigger than your head, but in today’s reality of multi-core power hogs burning like magnesium flares it is just asking for trouble. Trouble is what we’ve got right now. Trouble in the form of hot little boxes, be they 1U or blade servers. They are just too much heat in too constrained spaces. Virtualization won’t solve this problem. If anything it will just make it worse by increasing the efficiency of the individual CPUs making them run hotter more of the time. Virtualization might lower the power bills of the users inside the server, but it won’t really change anything for the facility that surrounds the servers in question. The watts per square foot impact won’t be as big as we hoped and we’ll still be faced with cooling a hot box within a constrained space.
So here is my challenge to the server manufactures: Think outside of the case.
This isn’t a new idea really, nor is it mine. We’ve all seen how Google has abandoned cases for their servers. Conventional wisdom says that only a monolithic deployment such as a Google datacenter can really make use of this innovation. Baloney. How often does anyone deploy single servers anymore? Hardly ever. If server manufacturers would think outside of the case, they could design and sell servers in 10 or 20 rack unit scale enclosures. They could even sell entire racks. By shedding cases altogether, both server cases and blade chassis, they could create dense, electrically simple, easy to maintain, and most importantly easy to cool servers. The front could be made of I/O ports, fans, and drives. Big fans for quiet efficiency. The backs could be left open, with electrical down one side and network connections down the other. Minimize the case itself to as little as possible… think of Colin Chapman‘s famous directive about building a better race car: “Just add lightness.” The case of a server should serve one purpose only: To anchor it to the rack. Everything else is a superfluous obstruction of airflow. No need for steel, as plenty of lighter weight materials exist that can do the job with less mass.
Go look in your datacenter with this new eye and envision all those server cases and chassis removed. No more artificial restriction of airflow. Your racks also weigh less than half of what they do today. You could pack twice the computing horsepower into the same amount of space and cool it more effectively than what you have installed.
Ten years from now we’ll look back at servers of this era and ask ourselves “what were we thinking??” The case as we know it will vanish from the data center, much like the horse and buggy a century before. We’ll be so much better without them.
The guy who runs SwiftCo uses a metal rack with semi-cased microtower units. He achieves greater density and is able to manage cooling more effectively because he is able to cool more with less. On the other hand, it looks ghetto, but he gets closer to the dream. He has rows and rows of these things, calls them BigIP, a massive storage array.
That being said, it’s hard for companies to design to those specs when pricing for that doesn’t match. Many large scale installations charge per rack unit, and as long as people charge per rack unit, you’re going to continue to see a lot of ridiculous 1U installations. Once a company creates a price model around responsible power usage and space, then you’ll see the manufacturers follow the money accordingly.
Until then, you have a bunch of lowest bidder clods buying lowest bidder equipment. 🙁
I have heard of their Redundant Array of Box Fans. 😉
The only people left charging per rack unit are datacenter tenants, NOT datacenter operators! Those of us who actually run our own datacenter facilities charge based on power usage… period.
Thanks for the input Nick!
First I have to say this: Ewww Dells. Yuck yuck. I have the pleasure right now of (thankfully) removing every single PE2550 -> PE2850 that have ever touched the racks of our cabinets and every single time I pull one it feels fantastic (and with a smile on my face) to throw them in a gigantic bin.
Back on topic. You’re right that solutions like that have been done but Google themselves have gone back to the ways of many 1U servers. I think in order for this to work CPU manufacturers have to design more thermal friendly CPUs. While lighter material can replace steel, something still has to direct the air over the succubus also known as a CPU, something that two efficient 120MM can’t seem to cool but 8 whiny loud 60MM can handle. However HP and Sun have sold entire cabs of servers that tend to generate less heat then 32 1U servers, while providing more or equivalent amount of computing power, but at the same time (like Sun’s M8000 or M9000) they are ridiculously expensive and 32x 1U machines are, for now, still cheaper for the customer to buy.
Hopefully my spiehl made sense (it made sense in my head but yeah… if it doesn’t feel free to delete my comment).
I only noticed two typos; other than that, it sounds good to me,
“They are just too mush heat in too constrained spaces.” – MUSH -> much
“We’ve all see how Google has abandoned cases ” – all SEE -> seen
Recently we deployed several (12) high density chassis in a single location on the east coast, and the cost for power -greatly- outweighed the rack cost. In fact, Telehouse in NYC recently revamped their power cost structure and are charging tenants nearly 3x the 2005 rate for power consumption (In executive reviews, I refered to it as the cost of cooling). There was such a deviation, that we ended up with a serious COGS issue. It’s worthwhile to note that the cost of rackspace actually decreased by 10%. This falls into the “look at this hand.. dont notice the other hand stealing your wallet..” category.
After referring to it as “the cost of cooling” with my execs, they seemed to understand it better than “the cost of power”. I dont know why this triggered a different thought process, but we are now engineering with lower BTU output in mind. I personally would like to see more Virtual Machines in datacenter space, as companies need to simply utilize their computing time more effectively.
Hey, my first post — not really helping you edit, but just random thoughts on a Tuesday.
-TD
Thanks for the typo spotting Rog!
Good points Tim. As to the “Stealing your wallet” thought, I think you’re wrong. This is an issue of free markets, supply & demand, and price adjustments helping an industry recover from 5 years of depressed prices. There was a glut of DC space in 2001 that pushed prices way too low. Remember when buying a rack at Exodus cost $6000 a month? Yeah, those were the “good old days” of 1999/2000, and if you look at it objectively those were sustainable, profitable numbers for an industry that was, and STILL IS exceedingly capital-intensive. It costs well over $1500 per sq’ to build a datacenter and hundreds of thousands of dollars a month to operate it. When the market crashed prices crashed right along with it. Exodus, and many others died before they could even start to begin to see ROI on all their built facilities. Fast forward to 2005 and rack prices are hovering between $300 and $1000 per month… all the while power requirements are going UP UP UP.
So the industry has been able to raise prices, but really only through the sleight of hand you mention…. by adding on surcharges for power/cooling. Even with this adjustment I would argue that the industry is priced too low. You can not build new facilities to meet the demand, and the gear made today is just too unsustainable to run in the facilities that exist.
–chuck
I do remember the pricing for racks at Exodus. I agree that I may have gotten used to the extremely low prices with the excess space available and the need for companies to sell it, so your point is well taken. I was recently in the AT&T facility in Lynnwood and was awestruck at how “empty” the facility appeared, meanwhile the facility manager was telling me they were nearing capacity. Those of us who are converting to chassis and virtual machines are consistently fighting the “old school” notion that we are paying for space, instead of the cooling we are really paying for in each facility.
To slightly switch gears, I have re-read your post and I wonder — could it be that everything old is new again? You seem to be describing mainframe construction, and if I throw in the ‘virtual machines’ discussion, aren’t we coming full circle, just more efficient?
-td
yeah, the “mainframe” concept did pop up in the back of my head too… however mainframes (outside of a Cray) never had massively dense CPU like you see in today’s server installs.
If you have a moment to chat I’d love to get some intel on the facility up north you mention. We frequently get shopped against them 😉
Drop me a note if you have a sec. email is my initials (cg) at forest dot net.
Well I’m way outside my field here but it seems to me that in terms of cooling, server engineers should turn to Porsche, Volkswagen or even Tatra for some design clues.
Tatra. Lol.
Actually, Chuck, I’m kind of fancying a Tatra. There’s something massively eccentric about an aircooled, 8 cylinder rear engine car. A Panhard Le Tigre also appeals. Finding mechanics for either species might pose a problem, though…!
I’ve never actually seen a Tatra in the flesh. Legendary machine though.
I’ve seen a few on trips to Hungary and maybe a couple at car shows in the UK. None in the US. But they are kind of cool. Czech Cold War bullet proof engineering. Take a look at this:
http://www.seriouswheels.com/pics-1960-1969/1966-Tatra-Type-T2-603-fa-lr.jpg
and this:
http://commons.wikimedia.org/wiki/Image:Tatra_603.jpg
and this too (love the dorsal fin LOL):
http://jalopnik.com/cars/tatra/tatra+land-usa-306439.php
Carrera Panamericana anyone…?