Thinking Outside The Case

Nice Rack!

Note: The below is a straight off-the-top-of-my head rant I dashed off to my editor at a technology journal I occasionally write for. I'm looking for feedback to tighten it up. Feel free to tear it apart!

When it comes to data center metrics the one most often talked about is square footage. Nobody ever announces that they’ve built a facility with Y-tons of cooling, or Z-Megawatts. The first metric quoted is X-square feet. Talk to any data center manager however and they’ll tell you that floor space is completely irrelevant these days. It only matters to the real estate people. All that matters to the rest of us is power and cooling – Watts per square foot. How much space you have available is nowhere near as important as what you can actually do with it.

If you look at your datacenter with a fresh eye, where is the waste really happening?

Since liquid-cooled servers are at the far right-hand side of the bell curve, achieving electrical density for the majority of us is usually a matter of effectively moving air. So what is REALLY preventing the air from moving in your data center? I won’t rehash the raised floor vs. solid floor debate (since we all know that solid floors are better) but even I know that the perforated tiles, or the overhead duct work is not the REAL constraint. A lot of folks have focused a lot of energy on containment; hot aisle containment systems, cold aisle containment systems, and even in-row supplemental cooling systems.

In reality however, all of these solutions are addressing the environment around the servers, not the servers themselves which are after all, the source of all the heat. Why attack symptoms? Let’s go after the problem directly: The server.

First of all, the whole concept of a “rack unit” needs to be discarded. I’ve ranted before on the absurdity of 1U servers, and how they actually decrease datacenter density when deployed as they are currently built. I’d like to take this a step further and just get rid of the whole idea of a server case. Wrapping a computer in a steel and plastic box, a constrained space, a bottleneck for efficient airflow is a patently absurd thing. It was a good idea in the day of 66 Mhz CPUs and hard drives that were bigger than your head, but in today’s reality of multi-core power hogs burning like magnesium flares it is just asking for trouble. Trouble is what we’ve got right now. Trouble in the form of hot little boxes, be they 1U or blade servers. They are just too much heat in too constrained spaces. Virtualization won’t solve this problem. If anything it will just make it worse by increasing the efficiency of the individual CPUs making them run hotter more of the time. Virtualization might lower the power bills of the users inside the server, but it won’t really change anything for the facility that surrounds the servers in question. The watts per square foot impact won’t be as big as we hoped and we’ll still be faced with cooling a hot box within a constrained space.

So here is my challenge to the server manufactures: Think outside of the case.

This isn’t a new idea really, nor is it mine. We’ve all seen how Google has abandoned cases for their servers. Conventional wisdom says that only a monolithic deployment such as a Google datacenter can really make use of this innovation. Baloney. How often does anyone deploy single servers anymore? Hardly ever. If server manufacturers would think outside of the case, they could design and sell servers in 10 or 20 rack unit scale enclosures. They could even sell entire racks. By shedding cases altogether, both server cases and blade chassis, they could create dense, electrically simple, easy to maintain, and most importantly easy to cool servers. The front could be made of I/O ports, fans, and drives. Big fans for quiet efficiency. The backs could be left open, with electrical down one side and network connections down the other. Minimize the case itself to as little as possible… think of Colin Chapman‘s famous directive about building a better race car: “Just add lightness.” The case of a server should serve one purpose only: To anchor it to the rack. Everything else is a superfluous obstruction of airflow. No need for steel, as plenty of lighter weight materials exist that can do the job with less mass.

Go look in your datacenter with this new eye and envision all those server cases and chassis removed. No more artificial restriction of airflow. Your racks also weigh less than half of what they do today. You could pack twice the computing horsepower into the same amount of space and cool it more effectively than what you have installed.

Ten years from now we’ll look back at servers of this era and ask ourselves “what were we thinking??” The case as we know it will vanish from the data center, much like the horse and buggy a century before. We’ll be so much better without them.

How many geeks does it take…

…to move a 6800lb (~3000kg) UPS?

Thanks to the amazing Hilman Rollers, only four and a half.

This is our new UPS at digital.forest. It is an MGE EPS 7000, which is a very cool unit. As purchased it is a 300 kVA system, but as we grow we can scale it up to 500 kVA. The battery cabinets arrive tomorrow. You can read about the UPS arrival on my blog at work.

Published, again.

I wrote a lengthy bit about communications as a key to surviving an IT disaster, which in many ways was a written version of the session I delivered at the MacIT conference at Macworld Expo last month. I tackle the stereotype of geeks as poor communicators, and lay out a strategy for getting IT departments into the communication habit. The stunning revelation that lead me down this road is a conclusion I came to when discussing an outage with a “layperson”… that is a user of technology rather than a maintainer of it. To him awareness was more important than downtime. Downtime didn’t bother him so much, so long as he was kept informed of what was going on, why, and when things would be back up. Forewarning would be even better. His downtime came about during a datacenter migration. A light bulb went off over my head, as I had successfully pulled off more than one datacenter migration within the past few years. Did everything go perfectly? Of course not, but the difference was that I put a huge emphasis on communication with our customers way before, before, during, and after the moves. I’m not some IT genius by any stretch of the imagination, and I’m not the first to use this tool effectively. It just seems that most IT professionals forget this critical part of their management strategy.

Anyway, for the terminally curious, the series is linked below. My editor wisely split it into two parts.

Part One

Part Two

Pay No Attention To The Man Behind The Curtain!

The server that hosts a lot of my images is down. It is my own personal box, which is almost as “vintage” as the cars it displays. It is a wonder that it works at all to be honest… clean living in a clean room I guess.

It had a minor disk issue earlier this morning and I’ve fixed it, but now I’m making a backup before I bring it back online. “Have patients!” …said the Mad Doctor! 😉

Update: OK, as of 11:45 Pacific Standard Time the image server is back online.

C’mon Apple… keep it coming.

A small step for apple...

OK, so maybe somebody at Apple has a clue or knows how to listen. They announced a new rev of the Xserve today. I won’t bother to talk about the stuff everyone focusses on (CPU horsepower and whatnot, I have friends and customers you can turn to in order to get the skinny on what’s happening inside the new box. ) I’ll stick to the subject of all my usual rantings about servers and server design, the case. This is because I don’t manage servers, as in “what goes on inside the server” I manage Datacenters, namely what happens OUTSIDE the server once it is racked and operating.

The momentous cause of my small celebration today? Apple put a USB port on the FRONT of the Xserve. Whoo hoo!

Mind you this is only a very small step away from ‘style” and towards “substance”, and ironically “usability” but it IS progress and I have to give Apple credit for that.

As I have said before, to be truly useful in the environments it was designed for the Xserve should have all “user” ports on the front, namely USB, and Video, and all “system” ports on the back, namely power, network, FibreChannel, etc. If it connects to another system or the datacenter infrastructure, it goes on the back. If it interacts with a user, it goes on the front.

Datacenters are laid out in hot aisles and cold aisles, where the hot back sides of servers are isolated from the cold intake side. This allows for optimum cooling and airflow. In ideal datacenter environments the hot aisles will be contained and the heat given a specific path for removal. If users have to constantly have access to the back side of racks (or more accurately the hot aisles) then they can not be easily contained. Putting user-required ports on the back side of servers is counter-productive.

Of course, that isn’t my biggest complaint about the Xserve’s design. That remains the completely absurd overall length of the box, which still lays out to 30″ (76.2cm) which is so long that it completely obliterates and density advantage a 1U server supposedly buys you.

I know I’ll get video ports on the front panel long before Apple pulls their head out their butts on case length of 1U boxes though.

Thanks guys.