Russian Roulette …with bombs.

How one software author’s unwise decision ruined my week.

Apologies for the long one, but it explains my lack of writing anything else this week.

Monday evening, as I was getting ready to take my youngest son out for what was likely to be his last Halloween (he’ll be turning 12 in a little over a month and 11 seems to be about the time that “kid stuff” starts losing it’s appeal) “trick or treat” with his friends, somebody pulled a trick on me that ruined my week.

Some history first though: We use a very nice mail server package called “Communigate Pro” by what used to be named “Stalker Software.” Communigate Pro (aka “CGP”) has a reputation for being fast, stable, and scalable. For the most part this has been true for us. We have had some issues with it though over the past four and a half years. We run CGP on several servers, since CGP has been used by several of the web hosting companies we have acquired over the years. The copy of it we bought for ourselves though has been the one that has caused us problems. It runs great for 50 weeks of the year, but for a week in August or September, and a week in December or January, it completely sucks rocks. The only way I can describe it is that interacting with CGP becomes like talking to a starfish.

I watched a show once that well illustrated at least one definition of the word “Relativity.” It showed how nature has made metabolism something of a clock, and that each species operates on a relative clock speed based on their metabolism. If you time-lapse film slow metabolism creatures like starfish, and then adjust the speed up to “match” our metabolic rate… the starfish look very active… zipping about the ocean floor, preying on urchins and other shellfish. Amazing really. Same goes in the other direction, slow down the film of a hummingbird and they start looking like any other bird. I guess to a Hummingbird, a human being looks like a starfish.

Well, for two weeks out of the year our CGP mail server’s metabolism slows to one of a starfish. It works, just at a truly GLACIAL pace. The Server and Operating system are fine (load is low, machine is responsive at the console, shell commands are fine, go figure.) This is obviously frustrating – for both us and our clients. The fact that it comes back like clockwork at certain times of the year is very odd. We eliminated all external causes (traffic, spam, etc) and Stalker support spent hours and hours trying to figure out what was wrong. The only suggestion they could ever come up with was “put a faster filesystem under it.” This error appeared in whatever version of CGP we ran, and I’m pretty sure that we tried them all, starting with 4.0.X, all the way up to 4.2.X (and this week, 4.3.X… but we’ll cover that later) but they all had that odd metabolism time shift appear twice a year.

Putting a faster file system under it usually cleared up the problem. As did switching platforms. We started on FreeBSD, moved to OS X (better threading), then up to OS X Server (on an Xserve); but also we jumped through all sorts of filesystem and bus technology switches, such as IDE, to SCSI, to various RAID setups, to eventually a 2Gb/s FibreChannel RAID array. Last summer when the starfish returned, on a whim (well, not a whim really, more a blind rage and pique of frustration since I wasn’t going to sink any more capital into filesystem improvements!!! Especially since they were seemingly NOT improving the situation!) I told my senior sysadmin to move the CGP directories to the internal IDE drive of the Xserve. Presto! The starfish vanished.

The server was back to it’s responsive, stable state. While I was happy with regards to that, since our clients weren’t angry at us, I was LIVID because all those tens of thousands of dollars we’d spent on hardware was a placebo cure for a real software problem. Stalker (now calling themselves “Communigate Systems”… aka CGS) had no explanation for this, and just sort of slinked away.

There is another significant wrinkle to this story, which explains why I was unable and unwilling to ride Stalker/CGS harder and force the issue into some sort of resolution. In November of 2004, CGS nee Stalker, made significant changes to their software licensing model, and jacked their prices up well over 5.5X their previous levels. Needless to say it was a shock to their customers. Prior to this date, their software was “expensive” but a relatively good value. (IIRC we paid between $8000 and $16,000 for our CGP licenses in 2000 and 2001.) Up until 2004 the core customer for Stalker were Service Providers such as ourselves. CGP had become something of a darling in the Industry press for being a solid performer and a far better value than absurdly over-done and outrageously expensive “Messaging Platforms” such as Lotus Notes and Microsoft Exchange. I guess this attention went to the head of Stalker/CGS’ CEO and founder Vladimir Butenko, and he began transforming CGP into one of those over-done and outrageously expensive “Messaging Platforms”. Hey, in some ways I can’t blame the guy… his core market – ISPs – had gone from niche-market players to a total commodity market with NOBODY making very much money, if any. Just beyond his grasp, and seemingly within reach was a cash-rich “Enterprise Market” with some dominant players showing real weakness. The astounding thing is the way he decided to get there: by actively pissing off their current customers and seeding them with confusion, fear and doubt. The existing customers, all ISPs, schools, and small businesses were angry. Stalker/CGS left no option for a “mail only” (no calendaring, groupware, MAPI support, VOIP support, SIP/PBX functionality, etc) version, and any continued use, other than the VERSION YOU ORIGINALLY BOUGHT would cost you a hefty sum in support and maintenance fees – 18% of purchase price, which in the new scheme was actually what you paid originally! So it was like having to buy your software again every year. Customers were livid, and the sturm and drang on Stalker’s support mailing list was out of control. Stalker’s CEO, Vladimir Butenko defended these new policies with characteristic Russian twisted logic and denial. I don’t know how to say “tough shit” in Russian, but that is what he did, albeit in far more diplomatic terms.

What he didn’t tell anyone at the time was that he ensured compliance with his new licensing scheme and inflated prices by inserting a “time bomb” into Communigate Pro. If your server thought it wasn’t properly licensed, it would cease to run at midnight UTC on some arbitrary date, and then, if re-launched would shut itself down ever 15 or 20 minutes thereafter. No warning. No coherent error code. No reason why. Bang. Boom. Off. Dead.

This was done without any announcement or warning. It add insult to injury, none of us customers had any idea which versions of Communigate Pro had the timebomb code in it or what the dates for explosion were. It was truly “Russian Roulette.”

Up until 2005, the standard refrain from Stalker Tech Support for any issue was “Please Upgrade to the latest version of Communigate Pro.” The support and sales staff frequently touted the benefit of “free upgrades” of their software. You got your value and return on your initial investment by always being able to stay current and get your bug fixes. We had changed versions via upgrade countless times, as we obviously had at least ONE big ugly bug, which unfortunately was never fixed. I don’t recall what version of CGP we were running when the license change was announced, but I knew that in February of 2005, when the (first of what I now assume are going to be many) CGP timebombs exploded we were running a version we weren’t apparently licensed for… despite the fact that we probably upgraded to it two months before while troubleshooting our latest visit from the Communigate Pro Starfish Mode. CGP servers around the globe all blew up at midnight UTC on February 1st 2005, including one of ours. Predictably the CGP support mailing lists, newsgroups etc also exploded with angry, frustrated customers. I called the guy at Stalker who we originally bought the software from and asked him flat out, “OK, tell me exactly what version of CGP we are allowed to run so that this timebomb won’t affect us again.” Bill, my senior sysadmin downgraded us to that version on February 1st, and life went on.

Later in 2005 our CGP Starfish returned, and that is when we tried the “move to internal IDE disk” trick which worked. I had not paid Stalker that hefty price for support and maintenance (or as they ironically call in their emails to me “S&M”) so I was in no position to demand that they admit this “starfish mode” bug exists and fix it. I was stuck at the version we were running for perpetuity. Such is the Kafka-esque world of software licensing. Instead I directed my staff to start evaluating alternatives to Communigate Pro. I didn’t want to be the victim of extortion to pay for the development of features for “Enterprise Customers” that we would NEVER use. Here is a great example: I was on the phone with a guy from Stalker/CGS and he was telling me how great their PBX/SIP/VOIP system was. I asked him “How do our customers call us if the mail server goes down?” I was answered by a very long silence… followed eventually by “Hmmm… never thought of that.” SMTP/POP/IMAP/Webmail… that is ALL I need thank you. So we looked at the expanding pool of products that were filling the void being left by CGP as it acsended to “Enterprise” status. We had narrowed the field to a small handful by last week.

Then we lost at Russian Roulette again.

At 4pm PST on October 31st, which is Midnight UTC, three of the 4 Communigate Pro servers at our facility exploded. Their timebombs went off and they all shut themselves down. My wife had to fill in for me as the Halloween driver (we live in a rural area, so I had planned on taking my son, and a few of his friends into town for trick-or-treating.) I spent the night hunched over my keyboard and on my VOIP phone (thankfully we don’t use Communigate Pro for our VOIP needs!) to my office dealing with the crisis. Based on past events, we very quickly came to the conclusion that it was the infamous Communigate Pro Time Bomb, and not some other issue since it happened at precisely the same time on more than one server, and we were not the only ones it was happening to. (Stalker’s mailing list, which is viewable on the web also was exploding with angry customers.) To get us through the night we rolled the clocks back on the CGP servers, and restarted them. In the morning we started the work of figuring out how to deal with this. I emailed Stalker trying to find out why, when they had told us that THIS version was OK for us, that it still had timebombed. I posted, and replied to other’s postings on the CGP mailing list, but my account was in “moderated” mode, and the moderator was obviously not paying attention (easy to do as that is a significant weakness of the CGP LIST module.) Vladimir Butenko appeared on the list, once again in his twisted Russian logic saying essentially ‘there is no timebomb, and besides you must be stealing my software since your server stopped working.’ Not exactly a confidence or trust building exercise in customer relations there Vlad.

After careful reading of the CGP website, I finally decided that our only course of action was to downgrade to version 4.1.8, which seems to be the last of the “free upgrades” and should run on our license key obtained in 2000. Bill figured he could downgrade the software, and restart the CGP service without causing much disruption to our clients. 4.1.8 went on, we restarted, and suddenly, without warning…

The Starfish Returns!

Our mail server software is once again, moving at the speed of a quaalude-soaked starfish taking a leisurely creep over the ocean floor. It is 7 weeks early, but the starfish is back… with a vengeance!

Great. Just what we need. A software vendor extorting us on one side, and clients angry at us for under-performing software on the other. My loyalty is with my clients, not the bastard that is holding the gun to my head, or the timebomb on my server as the case may be. I rally the staff and roll out a plan; we’ll build a new server from scratch, install a fresh OS and a new install of CGP 4.1.8 on it, move the data over to it and cutover the IP address. Based on our past experience, this should outwit the Starfish!

Thankfully a customer had just decommissioned a very nice Dual CPU/Dual Core Intel server with a built-in Ultra-SCSI RAID system, and we made him an offer on it that he accepted. The only problem with it was the drives inside were low-capacity. Thankfully we have stacks of Sun Stor-Edge Array’s in our backup system that were in an idle state, so we ripped out 6 36GB LVD Ultra-SCSI drives from one and packed them in the server, installed FreeBSD on it, and started rsync on a cross-over cable between it and our production mail server. Oddly enough this went pretty fast, despite CGP in “Starfish Mode” the OS and filesystem is thankfully quite responsive. System load went from 0.10 to 0.34 on the production server while we were syncing… while talking to the Starfish was unbearably slow. For example CGP’s web UI would take 15 minutes to click from page to page.

We cutover to the fresh box at around midnight on Tuesday/Wednesday, and things seemed ‘OK’… instead of talking to a starfish, it felt like talking to a sleepy dog. Movement was perceptible, but not exactly as swift as we had hoped. In past experience “starfish mode” would improve to reasonable performance in the wee hours of the night when the server was under lesser mail load. Since I was staying in my office and had nothing else to do, I vented about this situation to my online friends, discussed via phone with Russ Pagenkopf, the guy I run the Mac-Mgrs list with… ironically running on a Stalker-donated copy of CGP, which also quite ironically had also timebombed! Russ & I decided to cease running CGP on the Mac-Mgrs list server as soon as possible, and once he had it running again I posted to the list about that. I also answered people who were angry on the CGP list about what was going on with us, and some of them relayed to that list what I had said, both to them, and on Mac-Mgrs. The PR backlash at Stalker/CGS was gaining momentum. I think I managed to get about 3 hours of sleep that night.

Sure enough come Wednesday morning east-coast business hours our main server was back to moving like a starfish. I left my staff to handle the angry clients, while I swallowed my anger and called Stalker/CGS for tech support. I didn’t expect much, but luck was on my side and by chance a Director-level employee answered the phone (When our tech support queue gets busy, I pick up the phone too!) I explained our situation with CGP 4.1.8 doing this “glacial slowdown” thing (I haven’t called it “starfish mode” with anyone at Stalker/CGS to date.) I asked him if my long-time contact was there, and he said, “yes, he just walked into the office” so I said to catch up with him since he knew the full history of this almost 5 year old problem and I didn’t have the energy to relate it to him. After a few hours of troubleshooting (it took me 55 minutes just to get to the UI to change a password so Stalker support could access the server) I got a call from them. Three people, all director-level folks at Stalker/CGS were on the phone and making me an offer. They would give us a 90-day License for CGP 4.3.9 to let us load that one up and see if it would fix the “Starfish Mode” bug. I was too exhausted to say anything but “it is worth a try”…. They promised me quotes for extending the 90-day license within a day.

License keys in hand, I woke up Bill, our over-worked and underslept senior sysadmin and had him install the 4.3.9 version on our creeping starfish of a server and restart…. it seemed OK for about 30 secdonds, then immediately tailspun back down to starfish mode once again. It is obvious whatever this bug is, it has never been adequately addressed by Stalker’s coders and remains embedded deep within the current version, and probably in upcoming ones as well. The Stalker support guys were stumped, and fell back into random-mode troubleshooting again, suggesting courses of action which were either impossible due to not being able to perfrom them on such a slow moving system, or stuff they had suggested in the past – which we knew would not work.

I had a plan. It was a total “hail mary” play, but similar stunts had worked for us in the past with the Starfish. Nuke the box we had been running the mailserver on just days before… before the software timebomb exploded. Fresh install of this CGP upgrade, move the data over to it and cutover again. This may sound like what we just tried, and it does. Meanwhile I talked to MY director level guy and said, where ever we are with the proposed new mail system roll-out, hit the gas pedal and get ready to install and ramp it up ASAP! He brought me PO’s for gear and software, and I signed them. I wrote an apology to our clients about the situation, and posted it to our website. I grabbed my laptop and left my office for the first time in almost three days to get some fresh air, and food. I had the laptop as it seems that open wireless networks are everywhere now, so if they needed me at the office I could probably get on AIM or whatnot easily.

Bill finished the install and rsync work, and we cut over to the “old” mailserver around 5 PM PST on Wednesday and….

It worked. The starfish was back in hibernation once again, and the server was behaving “normally.”

I finished up some client communications, and basically passed out on my office couch a few hours later. I slept 12 hours straight.

So, at the moment I have 90 days to get a better mail system rolled out and running. I think we can get that done. We’ll probably build a fresh, old CGP 4.1.8 system to leave any clients that can’t/won’t move to the new system, so we’ll stay in compliance with Stalker/CGS’ looney license scheme, and perpetually avoid the Russian Roulette with Software Timebombs present in CGP 4.2.X and who knows what subsequent versions. We’ll probably NEVER get a satisfactory answer about the causes, or real cures for Communigate Pro’s “Starfish Mode”… but here is my hope:

Someday, it will return. Not to *our* server, but to one of these “Enterprise Customers” that Stalker/CGS so desperately wants to trade their current customers for. Some multi-million dollar CGP “Messaging Platform” cluster installations. They’ll have hundreds of thousands of dollars invested in hardware, and of course CGP software. Their mighty cluster will slow to an inexplicable crawl. They’ll spend massive amounts of time, and eventually money, trying to cure it. Vladimir will log into it and tell them “Put a faster filesystem under it”, so they’ll blow wads and wads of cash at exotic SAN architectures or the like. VP-level guys like me will lose sleep and in-the-trenches guys will loose even more trying to fix the problem of wrestling with a starfish. Then, some geek in the organization will be google-surfing phrases like “CGP slow” or “glacial communigate” and stumble upon this blog entry from who knows how many years past. He’ll pass it up the chain, and somebody will gather up the guts to call me. I chuckle and say “You spent HOW much money to buy this software from these idiots? What, are you NUTS?”

There, I just saved you the phone call.

Slashdot | Price of Power in a Data Center

Slashdot | Price of Power in a Data Center

Interesting read, on a subject I know pretty well. We will likley have to institute power surchages to our colocation customers soon.

I liked this interesting tidbit from the comments section:
Also the street price for a 20A circuit in a datacenter is $200-$300, while the cost of a megabit is $100 or less. So a rack of servers that requires two power circuits and pushes 3Mbps (not an unusual scenario) costs twice as much in power than in bandwidth.

I’ll write more about this subject soon.

Keychain Access Hurdle Cleared!

I have a quote in my email .sig file, and it is likely in this blog’s “random quote” database as well (just keep hitting “refresh”) from one of my staff. It goes like this:

There’s only so much stupidity you can compensate for;
there comes a point where you compensate for so much
stupidity that it starts to cause problems for the
people who actually think in a normal way.

-Bill, digital.forest tech support

“Bill” in this case is Bill Dickson, my highly valued, and true treasure of a sysadmin. You can see his blog (WRD) listed in my blogroll, though I swear he is going for a world’s record for NOT updating his blog. He’s close to a year now.

Anyway, that rather insightful quote sums up what is going on with me and my keychain. I vented on a couple of mailing lists and was informed (by some well-informed people both inside and out of Apple) that Apple originally designed the Keychain system to work just as I was using it. With it being independent of the login password and flexible enough to allow people to use multiple keychains however they wished.

The user community apparently bitched and complained a LOT to Apple that they didn’t like the fact that they were independent of each other and that a change of the login password SHOULD also change the keychain password. I guess the majority of MacOS X users DO keep their login and keychain passwords the same. Me? I think that is stupid. I guess a lot of software engineers inside Apple thought it is stupid too. It became something of a fight between engineering and marketing (isn’t it always?) and engineering finally lost with 10.4.

So Apple caved and compensated for their customer’s stupidity, and ended up burning not-stupid people like me in the process.

Oh well. If you are a software engineer at Apple, the phrase “Asshat at Apple” I used in yesterday’s rant was NOT directed at you. But feel free to assume it refers to the people who forced you to make that change in the default behavior.

Speaking of forcing… I got my password back. It seems the “please select a longer password” dialog box is a placebo. If you just keep force-feeding the Keychain Access utility your unapproved password, it will accept it. Go figure.

I’m happy as a clam.

Whisky Tango Foxtrot Apple Keychain Login Sync

On a whim, in MacOS X 10.4, because I was tired of my old login passwd, I changed it. No biggie, right?

I was presented with a dialog, basically saying “Your keychain password has also been changed.” …huh?

Bahhhh!!! No! I didn’t want that!

Grrr… go stomping off to the Utilities folder for the Keychain app and find buried in a preference somewhere a CHECKED radio button saying “synchronize with login password” or something similar.

WTF??

I NEVER checked that box. I have always… always had different login and keychain(s) passwords. I would never dream of making them the same. What asshat in Cupertino decided the make this choice for me?? And HOW can I change my default keychain password back to what I want it to be (perhaps I missed the obvious in my blind rage?)

I REALLY want to kill somebody at Apple right now. What a bonehead thing. No warning beforehand, just changed something very personal and important. Of course I have UNchecked that box now, but it is too late. I’d like to change my muscle-memory-embedded keychain passwd back to what it should be. If I missed some obvious design change, announcement, READ ME file, or clearly marked option that lead to this situation, feel free to point it out to me. Otherwise bring me the head of the idiot who dreamed up this stupid default action in MacOS X “Tiger”.

Maybe this is “normal” now that I think of it… I may have always changed my password at the CLI prior to this… I think I better go eat dinner and calm down.

Update: After I calmed down a bit, I went back into Keychain Access and found the obvious menu choice to change the password for the keychain. No need to clue me about this. BUT, let me tell you about keychains, and how I use them. I have a login password. I have my laptop set to require both a username and password at login (IIRC OS X defaults to a list of usernames, with only the password field blank.) I also require my login password for waking my laptop from sleep, or to get past the screen saver.

Keychains, to my mind, are completely different from login passwords. The keychain is where you store all your various passwords for all those email accounts, web servers, stupid blogs like this one, etc. I actually use several keychains. I have my default keychain, which is where I store the most frequently used, but in no way terrifically important passwords. The the passwords for this stupid useless blog, or the shared IMAP boxes we use at work to read generic email addresses like “support@forest.net”, ” abuse@forest.net”, etc… you know the ones listed in whois that get more spam than real email. I have several other keychains, and these store progressively more secure data. Access passwords for ARD, Timbuktu, SNMP strings, specific personal passwords and data that I prefer to keep secure… and then finally there are some passwords I just won’t keep stored anywhere but my brain, enable passwords for BGP routers and other network devices, root passwords for our DNS and mail servers, etc.

Every keychain has a different password and they get progressively more complex with the level of security required for the keychain data.

My default keychain has had a four character password. Mind you, it isn’t a word, or even anything logical. It is a random string of 4 characters from 3 different rows of the keyboard. I have been using this password, and simple variations of it for 15 years. It takes me about a nanosecond to type it. It is so deeply ingrained in my muscle memory that I can bang out those four keystrokes and hit return, even with my clumsy two-finger typing style in less time than it takes me to type any other 5-character string imaginable. This is WHY I use it for mundane, default keychain access… clickety-click!

But NO. Some Apple asshat, probably the very same asshat that decided to default the login/keychain sync, decided that THEY decide what level of security my default keychain should have:

Just fsckin bite me!

If I feel the need to hide my lame, low-security password to my slashdot login with a 4 character password, then LET ME. I am an adult, I understand that somebody could steal my laptop and run a crack against my default keychain and probably crack it is a few minutes. Big deal. So they’ll have access to some stuff that I have already decided is low-security which is why I prefer to keep access to it EASY rather than hard. If I want to be an idiot, LET me be an idiot. Please. I’m fine with risk concerning this particular data. sheesh!

So if anyone knows some clever way around this stupid limitation, please let me know. I have no problems doing it on a command line, been there, done that since the Bush the Elder was borking up the economy.

Until then, I will be tripping over my fingers typing three extra characters to get to slashdot, spam, and this dumb blog. sigh.

Rising costs, fixed prices.

Pondering pricing in a post-economic downturn era.

One of the services we offer at digital.forest is data backup. We have four backup servers that run scripts to backup data from “clients” (that is, servers, owned by clients of digital.forest, running “client software” for the backup server. Got it?) Like all services here, we haven’t changed the price of backup in well over 4 years.

Right before the “dot com crash” (which wasn’t a “dot com crash”, but I’ll explain my views on that some other time) we actually performed a large-scale review of all of our offerings, what the competition was charging – and started adjusting our prices accordingly. At that time, we were a small-scale operation with a set of niche offerings. The only price hike we managed to complete prior to the economy’s turn was on FileMaker hosting. We are the largest FileMaker database hosting operation in existence. At one time it was a growing business, but between 2000 and 2004 it seriously stagnated… more due to FileMaker Inc (FMI) taking far too long to rev FMP 5 (again, I’ll have to leave my views on FMP and FMI for another post.) We saw a lot of our clients migrate to PHP/mySQL solutions from Lasso-or-CDML/FMP solutions. Hosting FMP databases is a very expensive business to run since it requires more resources – more software and servers per customer that just about any Internet database offering I can think of. Odd considering that it is considered a “low end” database solution. So it made sense to raise our prices, especially since FMI kept raising theirs. If I recall correctly we raised them about 10-15%, but only lost about 1% of our clients due to the price hike. That was an interesting exercise in Capitalism. Too bad the economy, other database products, and FMI’s slow work on what eventually became FMP 7 managed to wipe out 40% of our FMP hosting business over the next four years.

Thankfully other offerings filled the gap. Server colocation became a significant part of our business. We had built a pretty nice little datacenter by 2000. It was small, but had almost everything you would expect to find in a large-scale industrial datacenter, just on a small scale. It was basically some converted office space in Bothell, but we had a great backup power system, and multiple fiber lines coming into the building. We were an autonomous network with BGP4 connectivity to several major Internet “backbone” providers (I hate that term, but I’ll use it here for simplicity.)

In 2001/2002 when large colo providers were going down every week, or consolidating datacenters, we went from being viewed as “risky because we’re small” to being “safe because we’re small.” Another thing that happened at the same time, and continued well into 2004, is that prices plummeted. Webhosting rates fell by 60% or so, and server colocation fell through the floor to unsustainable rates. I remember in 2000 Exodus charged anywhere from $4000 to $8000 a month for a single rack. We charged $2000, which was “cheap.” Within two years the “big boys” (which in Seattle meant only InterNAP and a few remaining operators) were practically giving rackspace away. I remember losing an 8-rack deal to InterNAP in early 2003 when they lowballed the price to something insane like $250 a rack. It was obvious they were floating on investment capital, had a big huge brand new (but mostly empty) facility to fill, and knew that any revenue was better than no revenue. We have never been big enough to operate like that. Our colo prices have come down though, right along with the rest of the industry. No, you can’t buy a rack from us for $250, but we have gone from being a “value priced” provider to being about the same as everyone else, if not a little high. I’m OK with not being the cheapest, mostly because we offer what so few providers can’t, personal service. We are a niche player, not a commodity one.

Today we are still here, still growing, and overall doing pretty good. We moved into a new facility (ironically one built by a failed competitor) and now actually do have a top-tier facility in every way. Unfortunately the costs of operation have grown at the same rate as our growth, and we have basically kept our level of profitability all along (if you were to pool our total profit over the past three years you could buy a small Korean sedan.) We at least are marginally profitable, unlike so many in our industry. We’ve done it by taking advantage of every cost savings we could find (in bandwidth, equipment, etc.) and keeping the rising costs (electricity, storage, people, etc.) as under control as we could.

So our prices have either stayed where they were in 2000, or in many cases, gone down. One price that has been frozen is data backup. Back in 2000 we charged $30 a month for data backup. Back in 2000 your average web server had maybe 250 megabytes of data, with 20 megs of that changing on a daily basis (usually database dumps.) We were running a VXA tape library with a 15-tape capacity, and our other two backup machines ran single drive AIT tapes. So at $30 a month we were covering the cost of the tape autoloader and probably making a buck or two per client until the cost of the library was covered. I doubt it ever was because by 2002 we had to start backing up to hard disks. Why? There just was not enough time in a night to backup to tape anymore. Our backup window kept increasing until we were backing up during non-night hours. When our backup software started supporting backups to HDDs we jumped on it and started buying the biggest disks we could (at that time around 100GB) and using them like tape – chew them up and throw them out. When drives got bigger, we bought bigger drives – 120GB, 180GB, 200GB, 250GB. Of course, so did our customers, so we were rarely able to stay ahead of the time/capacity curve.

Apple shipped their XRAID drive array a couple of years ago, and we have purchased a few since to add to our arsenal of backup and storage devices. We sell space on one for clients, but use the others for backup media.

About three months ago I cried “uncle”… Here we are, spending tens of thousands of dollars to maintain a service we are making a few thousand dollars a year on. We’ve fallen into a similar trap our competitors did when they dropped colocation prices in 2002… only this time we didn’t raise our prices to at least match the cost of the service provided.

We are using close to 6TB of storage, and backups now run 7/24. Any pause for a data restore puts us in a position where we play catch-up for several days. Clients complain about missed backups (your server too slow? sorry, we have to skip you); clients complain about backups happening during business hours (OK, we can put you in the special “nighttime” script, but no guarantee that we can back you up every night); clients complain about the time it takes to back them up (let’s do the math… three 250GB volumes of mostly uncompressed and non-compressible data, over a network at around 250-300 MB per minute… that is almost two days!)

The client who has a small server with a few hundred MB of data? They are still paying a reasonable data backup price at $30 a month. The client with more than 50GB of data (and we have some with >TB of data) THEY are getting way more service @ $30 a month than they can imagine, even when we skip them or miss them entirely a few times a week.

It is obvious that we have to implement a pay-for-what-you-use data backup system, and that is what we are about to do next month. It could not come soon enough for me.

The Rains Have Returned

I don’t comment much about weather, but the subject came up in an iChat with a friend on the east coast.

Here in the Pacific Northwest we really only have two seasons, “wet” and “dry”. “Wet” lasts from sometime in September or October, until early July. Dry lasts from early July (usually the 5th or 6th!) until sometime in September or October.

In the 1980s “drought” years “dry” would sometimes last until November. I remember climbing “Outer Space” on the Snow Creek Wall over a weekend in early November around 1987 or so. It was cold at night but very warm… “hot” even during the day. Recently our weather has been unsettled, with either VERY wet years (in 1999 we didn’t really have a “Dry” season… until September. Last winter was the as dry as “wet” can be, with hardly any snow in the mountains and very little rain down here. Oddly enough the past few year’s “wet” started big, with some big October storms … these pictures were taken two years ago today. But then settled into a “very sparsely moist” rather than our usual full-on “wet”.

Well, the “wet” has returned to the Pacific Northwest. October has been more rainy than clear, and quite chilly as well. We had a brief little “Indian Summer” the past two days, mostly sunny temps in the high-60s F. Friday I drove the Jag down to the body shop for the bonnet ding to get repaired, and yesterday was spent hacking back the grass since the sun was out (making hay while the sun shines as it were.) Today however reinforces how brief that nice respite was. Rain, mist, fog, temps in the 40s & 50s F.

It will be this way, relentlessly wet, with only high winds and storms to break the monotony from now until January when the “storm season” ends, and then it will just be plain old rain. You can basically say “Rain, mixed with showers, with a rare sun-break, lows in the mid-30’s, high’s in the mid-50’s” if you were a weatherman from now until March. Sure, we’ll have a few snows sprinkled in there, and of course that one week in January or so when the sun comes out… just to keep us from killing each other. Sometime in April we’ll start to see more sun, and warmer temps, and the Jag will come back out of the barn now and then. Until then, you won’t hear me talk about it other than winter-time projects. (Like my plan to perhaps do something about the radio console once and for all.)

When the “Dry” season returns, I’ll comment about our reward for the crappy weather we put up with around here, until then, I’ll try not to say much about weather.

Stop me somebody…

before I buy this.

It is close by. It is cheap. It is legendary for reliability – one was the world record holder for mileage: 1.9 million miles(!). It is of course a Diesel. Not exactly the 300SD or 300SDL I’ve been looking for, but still. I bet it does 0-60 in at least 30 seconds! (downhill, with a tailwind!)

Update 10/23/05: I managed to show some self-restraint and didn’t bid. Final price wasn’t too bad. I did some reading on the model and it was the first really successful Merceded-Benz Diesel for export. It has ZERO collectible value though.