Screamin’ deal on an Xserve

A supplier we use has a bunch of Xserve Cluster Nodes; Dual CPU G5, 2 gigs of RAM, 80gb disk… 33% off retail.

Since you hardly ever see Apple gear at more than 10% to 15% off, this is a great buy. We are buying a bunch for ourselves, and if anyone is interested in grabbing one or more, let me know. Send email to: cg at forest dot net.

Back from Macworld Expo

Macworld Expo is like reliving your Freshman year in college, condensed into a week: Lots of Hard Work, Sleep Deprivation, and Binge Drinking.

I’m back home after Macworld Expo, and of course have a cold. Those of us in the “mac community” know this as the “post expo crud”… sigh.

Anyway, I do not consider myself one of those Technology Pundits who feels the need to comment on every move made by Apple, Microsoft, etc. and spread it around the “Blogosphere.” So if you are looking for Yet Another SteveNote rehash, or my view on the Intel based Macintosh you are out of luck. For one thing the famed “Reality Distortion Field” has no affect on me (it actually works backwards.) Additionally, I spent about 3 minutes playing with a MacBook Pro (what a dumb name!) so I can’t really give you an honest assessment of its performance beyond “yeah, it worked.”

I attend Expo because for me it is an invaluable opportunity to meet face to face with my peers. I get together with a bunch of people that I converse with via email and iChat during the rest of the year, and an even greater number of people whom I really only see once a year. At Expo. When I was an “apprentice technologist” back in the early 90s, I was lucky enough to attend a series of great conferences, Mactivity, Seybold, Macworld Expo, etc. At those conferences I learned a lot, was motivated by what I saw, and met and became acquainted with a number of very smart people.

So for me, expo is a way to reconnect to those folks, as well as take an opportunity to provide motivation to today’s conference attendees… which I consider a responsibility now that I have survived to become one of technology’s, as my friend Chuq von Rospach says, “old pharts.”

This year I spoke at the MacIT Conference part of Expo. I shared the stage with Shaun Redmond, with whom I have done four previous conference sessions (usually on Network Troubleshooting and/or Security.) Shaun and I work well together in that we are able to communicate well in tandem, never having any pauses, and he plays the great straight man and provides me with perfect setups for punch lines. Shaun & I took on the subject of “Building A Better Datacenter”, which took on the issues surrounding building and maintaining server facilities. Shaun works for a school district in Ontario, Canada, and so he represented the “small” end of the scale, whereas I, even though our datacenter is of a modest (~1000 servers, 5000sq ft.) size by industry standards, represented the “large” end. The audience was small, but I will say they were enthusiastic and a great group to speak to. We used up our 90 minutes, and ended up staying over a half hour longer answering all their questions. I REALLY like it when an audience is into the subject matter and participate like that. I also liked that this is a subject matter that will ‘stick’… meaning that there is no possibility of a software release, or a change in technology strategy will alter or dilute the knowledge that Shaun & I taught the audience. I’ve done sessions in the past that were obsolete within a year (or less!) and it is frustrating considering the price that conference attendees pay.

I also meet up with Vendors, Clients, professional associates, make a few dashes about the show floor, and of course attend parties and socialize. Macworld Expo is like reliving your Freshman year in college, condensed into a week: Hard work, sleep deprivation, and binge drinking.

There are annual events that I can’t miss:
* The Mac-Mgrs Night-before-the-Keynote Get Together
Hard to miss since I am the host!
* The A/UX Users Group Dinner
None of us still use A/UX, but we can’t let the long-running joke die
* The ‘Netter’s Dinner at Hunan
this year with the return of John Pugh after an 8 year absence
* The YML “Rock’s Expo” party
a great affair put on by a long-time client of digital.forest, Shawn King & Your Mac Life

Expo still motivates me too. One great benefit of the Conference Faculty pass is that I can sit in on any session. I try and focus on sessions that I can apply to the coming year’s technology goals for digital.forest. Paul Kent from IDG puts on a great technology conference and somehow every year manages to hit a sweet spot of what people need to know. I picked up some great ideas and am looking forward to setting the goals for my group at work this year based on what I learned this week.

Delivery Boy

I shuttled a replacement server up to Vancouver yesterday. Our old DNS server “willow” finally died. Since I live halfway there I drove it up. Everything went well except for two things.

#1: I can’t find my keycard for the Peer1 facility.

No big deal, I call the NOC and a guy comes down to let me in. I walk around from the door I usually go in on the east side of the building down to the loading dock on the north side. Of course I am carrying this 40lb server. Ugh. Not good for my just barely healed back. Then the Peer1 NOC guy locks us out of the loading dock, so we trudge up the loading dock ramps, and around to the SE corner of the building… uphill all the way. My back was really hurting and by the time we got to the elevator inside the building lobby the Peer1 NOC guy must have noted the pain on my face and volunteered to carry the server for a while. We arrive in the datacenter and I’m still in my “work clothes” and a gore-tex jacket. It is HOT in the DC. I hand him the server back (we had swapped again as he unlocked doors) and stripped off the jacket. Thankfully our little server enclosure (a wire mesh “hockey locker”) has an HVAC vent right above it so while I’m working I have cold air blowing on me.

#2: The damn server doesn’t fit in the enclosure!

This trend of making servers 1U high and as long as an aircraft carrier is just completely out of control. This box is a Dell server, and it is about 1″ deeper than the rack it is in. I end up having to stand in on its nose. Plus I have to carve off the RJ-45 cable boot in order to thread the cable into the deeply recessed jack. I guess I’ll talk to Peer1 about exchanging our rack for a different one.

So now my back is hurt again, and our server is mounted vertically.

65E Numberplate

In PostScript (I’ve since made some edits… but this is pretty close) … dump it to your interpreter and you should get something like this.

%!PS-Adobe-3.0
%%Creator: chuck goolsbee
%%Title: (65E)
%%CreationDate: (11/12/05) (8:50 PM)
%%DocumentProcessColors: Black
%%DocumentNeededResources: procset Adobe_packedarray 2.0 0
%%+ procset Adobe_cmykcolor 1.1 0
%%+ procset Adobe_cshow 1.1 0
%%+ procset Adobe_customcolor 1.0 0
%%+ procset Adobe_typography_AI3 1.0 0
%%+ procset Adobe_IllustratorA_AI3 1.0 0
%%BoundingBox: 0.1843 28.6695 1296 305.6695
%AI3_TemplateBox: 306 396 306 396
%AI3_TileBox: 0 0 612 792
%AI3_DocumentPreview: None
%%ColorUsage: Color
%%EndComments
%%BeginProlog
%%IncludeResource: procset Adobe_packedarray 2.0 0
Adobe_packedarray /initialize get exec
%%IncludeResource: procset Adobe_cmykcolor 1.1 0
%%IncludeResource: procset Adobe_cshow 1.1 0
%%IncludeResource: procset Adobe_customcolor 1.0 0
%%IncludeResource: procset Adobe_typography_AI3 1.0 0
%%IncludeResource: procset Adobe_IllustratorA_AI3 1.0 0
%%EndProlog
%%BeginSetup
Adobe_cmykcolor /initialize get exec
Adobe_cshow /initialize get exec
Adobe_customcolor /initialize get exec
Adobe_typography_AI3 /initialize get exec
Adobe_IllustratorA_AI3 /initialize get exec
%%EndSetup
[] 0 d
3.863708 M
1 w
0 j
0 J
0 O
0 R
0 i
0 0 0 1 K
0 0 0 1 k
0 A
u
0.1843 305.6695 m
1296.1843 305.6695 L
1296.1843 28.6695 L
0.1843 28.6695 L
0.1843 305.6695 L
f
648.1843 167.1695 m
F
U
0 0 0 0 k
770.1118 267 m
907.9 267 L
907.9 232 L
806.2198 232 L
806.2198 193 L
872.0808 193 L
872.0808 158.017 L
806.2198 158.017 L
806.2198 101.9744 L
909.922 101.9744 L
909.922 67 L
770.2007 67 L
770.1118 267 L
f
573.3953 267 m
684.8969 267 L
684.8969 232 L
610.6588 232 L
604.8815 207.1265 L
604.8815 207.1265 602.2817 208.8598 622.2133 208.8598 c
642.145 208.8598 686.9189 187.625 686.9189 136.64 c
686.9189 81.4635 636.657 65.4591 616.1472 65.2867 c
581.7724 64.9979 551.5 85.25 543.9312 128.0957 C
577.5616 135.1623 l
582.75 118.375 589.25 100.53 613.8363 100.53 c
633.4811 100.53 650.522 114.6851 651.099 137.5119 C
651.0995 137.6055 651.0998 137.6994 651.0998 137.7935 c
651.0998 158.017 637.2343 173.9054 612.6808 173.9054 c
588.1274 173.9054 561.5519 154.5505 561.5519 154.5505 C
560.4492 153.5916 558.2249 152.2999 558.2249 152.2999 c
558.2249 152.2999 558.6792 155.521 559 157.125 c
559.75 160.875 573.3953 267 573.3953 267 C
f

*u
407.9 138 m
407.9 158.4348 424.4652 175 444.9 175 C
465.3348 175 481.9 158.4348 481.9 138 C
481.9 117.5652 465.3348 101 444.9 101 C
424.4652 101 407.9 117.5652 407.9 138 C
f
444.8999 210.0562 m
484.696 210.0562 516.9562 177.7961 516.9562 138 C
516.9562 98.2039 484.696 65.9438 444.8999 65.9438 C
405.1039 65.9438 372.8437 98.2039 372.8437 138 C
372.8437 144.9249 373.5206 151.9217 375.344 158.26 C
375.344 158.26 379.5 183.5 397 213.5 c
414.5 243.5 436.5 267 436.5 267 C
494 267 L
494 267 475.1373 253.882 462.5 241.5 c
450.125 229.375 440.4013 216.9686 438 213.5 c
436.6707 211.5798 434.9751 209 434.9751 209 C
437.1001 209.875 442.8262 210.0562 444.8999 210.0562 C
f

*U
%%PageTrailer
gsave annotatepage grestore showpage
%%Trailer
Adobe_typography_AI3 /terminate get exec
Adobe_IllustratorA_AI3 /terminate get exec
Adobe_customcolor /terminate get exec
Adobe_cshow /terminate get exec
Adobe_cmykcolor /terminate get exec
Adobe_packedarray /terminate get exec
%%EOF

Yes, I am a pathetic geek.

Russian Roulette …with bombs.

How one software author’s unwise decision ruined my week.

Apologies for the long one, but it explains my lack of writing anything else this week.

Monday evening, as I was getting ready to take my youngest son out for what was likely to be his last Halloween (he’ll be turning 12 in a little over a month and 11 seems to be about the time that “kid stuff” starts losing it’s appeal) “trick or treat” with his friends, somebody pulled a trick on me that ruined my week.

Some history first though: We use a very nice mail server package called “Communigate Pro” by what used to be named “Stalker Software.” Communigate Pro (aka “CGP”) has a reputation for being fast, stable, and scalable. For the most part this has been true for us. We have had some issues with it though over the past four and a half years. We run CGP on several servers, since CGP has been used by several of the web hosting companies we have acquired over the years. The copy of it we bought for ourselves though has been the one that has caused us problems. It runs great for 50 weeks of the year, but for a week in August or September, and a week in December or January, it completely sucks rocks. The only way I can describe it is that interacting with CGP becomes like talking to a starfish.

I watched a show once that well illustrated at least one definition of the word “Relativity.” It showed how nature has made metabolism something of a clock, and that each species operates on a relative clock speed based on their metabolism. If you time-lapse film slow metabolism creatures like starfish, and then adjust the speed up to “match” our metabolic rate… the starfish look very active… zipping about the ocean floor, preying on urchins and other shellfish. Amazing really. Same goes in the other direction, slow down the film of a hummingbird and they start looking like any other bird. I guess to a Hummingbird, a human being looks like a starfish.

Well, for two weeks out of the year our CGP mail server’s metabolism slows to one of a starfish. It works, just at a truly GLACIAL pace. The Server and Operating system are fine (load is low, machine is responsive at the console, shell commands are fine, go figure.) This is obviously frustrating – for both us and our clients. The fact that it comes back like clockwork at certain times of the year is very odd. We eliminated all external causes (traffic, spam, etc) and Stalker support spent hours and hours trying to figure out what was wrong. The only suggestion they could ever come up with was “put a faster filesystem under it.” This error appeared in whatever version of CGP we ran, and I’m pretty sure that we tried them all, starting with 4.0.X, all the way up to 4.2.X (and this week, 4.3.X… but we’ll cover that later) but they all had that odd metabolism time shift appear twice a year.

Putting a faster file system under it usually cleared up the problem. As did switching platforms. We started on FreeBSD, moved to OS X (better threading), then up to OS X Server (on an Xserve); but also we jumped through all sorts of filesystem and bus technology switches, such as IDE, to SCSI, to various RAID setups, to eventually a 2Gb/s FibreChannel RAID array. Last summer when the starfish returned, on a whim (well, not a whim really, more a blind rage and pique of frustration since I wasn’t going to sink any more capital into filesystem improvements!!! Especially since they were seemingly NOT improving the situation!) I told my senior sysadmin to move the CGP directories to the internal IDE drive of the Xserve. Presto! The starfish vanished.

The server was back to it’s responsive, stable state. While I was happy with regards to that, since our clients weren’t angry at us, I was LIVID because all those tens of thousands of dollars we’d spent on hardware was a placebo cure for a real software problem. Stalker (now calling themselves “Communigate Systems”… aka CGS) had no explanation for this, and just sort of slinked away.

There is another significant wrinkle to this story, which explains why I was unable and unwilling to ride Stalker/CGS harder and force the issue into some sort of resolution. In November of 2004, CGS nee Stalker, made significant changes to their software licensing model, and jacked their prices up well over 5.5X their previous levels. Needless to say it was a shock to their customers. Prior to this date, their software was “expensive” but a relatively good value. (IIRC we paid between $8000 and $16,000 for our CGP licenses in 2000 and 2001.) Up until 2004 the core customer for Stalker were Service Providers such as ourselves. CGP had become something of a darling in the Industry press for being a solid performer and a far better value than absurdly over-done and outrageously expensive “Messaging Platforms” such as Lotus Notes and Microsoft Exchange. I guess this attention went to the head of Stalker/CGS’ CEO and founder Vladimir Butenko, and he began transforming CGP into one of those over-done and outrageously expensive “Messaging Platforms”. Hey, in some ways I can’t blame the guy… his core market – ISPs – had gone from niche-market players to a total commodity market with NOBODY making very much money, if any. Just beyond his grasp, and seemingly within reach was a cash-rich “Enterprise Market” with some dominant players showing real weakness. The astounding thing is the way he decided to get there: by actively pissing off their current customers and seeding them with confusion, fear and doubt. The existing customers, all ISPs, schools, and small businesses were angry. Stalker/CGS left no option for a “mail only” (no calendaring, groupware, MAPI support, VOIP support, SIP/PBX functionality, etc) version, and any continued use, other than the VERSION YOU ORIGINALLY BOUGHT would cost you a hefty sum in support and maintenance fees – 18% of purchase price, which in the new scheme was actually what you paid originally! So it was like having to buy your software again every year. Customers were livid, and the sturm and drang on Stalker’s support mailing list was out of control. Stalker’s CEO, Vladimir Butenko defended these new policies with characteristic Russian twisted logic and denial. I don’t know how to say “tough shit” in Russian, but that is what he did, albeit in far more diplomatic terms.

What he didn’t tell anyone at the time was that he ensured compliance with his new licensing scheme and inflated prices by inserting a “time bomb” into Communigate Pro. If your server thought it wasn’t properly licensed, it would cease to run at midnight UTC on some arbitrary date, and then, if re-launched would shut itself down ever 15 or 20 minutes thereafter. No warning. No coherent error code. No reason why. Bang. Boom. Off. Dead.

This was done without any announcement or warning. It add insult to injury, none of us customers had any idea which versions of Communigate Pro had the timebomb code in it or what the dates for explosion were. It was truly “Russian Roulette.”

Up until 2005, the standard refrain from Stalker Tech Support for any issue was “Please Upgrade to the latest version of Communigate Pro.” The support and sales staff frequently touted the benefit of “free upgrades” of their software. You got your value and return on your initial investment by always being able to stay current and get your bug fixes. We had changed versions via upgrade countless times, as we obviously had at least ONE big ugly bug, which unfortunately was never fixed. I don’t recall what version of CGP we were running when the license change was announced, but I knew that in February of 2005, when the (first of what I now assume are going to be many) CGP timebombs exploded we were running a version we weren’t apparently licensed for… despite the fact that we probably upgraded to it two months before while troubleshooting our latest visit from the Communigate Pro Starfish Mode. CGP servers around the globe all blew up at midnight UTC on February 1st 2005, including one of ours. Predictably the CGP support mailing lists, newsgroups etc also exploded with angry, frustrated customers. I called the guy at Stalker who we originally bought the software from and asked him flat out, “OK, tell me exactly what version of CGP we are allowed to run so that this timebomb won’t affect us again.” Bill, my senior sysadmin downgraded us to that version on February 1st, and life went on.

Later in 2005 our CGP Starfish returned, and that is when we tried the “move to internal IDE disk” trick which worked. I had not paid Stalker that hefty price for support and maintenance (or as they ironically call in their emails to me “S&M”) so I was in no position to demand that they admit this “starfish mode” bug exists and fix it. I was stuck at the version we were running for perpetuity. Such is the Kafka-esque world of software licensing. Instead I directed my staff to start evaluating alternatives to Communigate Pro. I didn’t want to be the victim of extortion to pay for the development of features for “Enterprise Customers” that we would NEVER use. Here is a great example: I was on the phone with a guy from Stalker/CGS and he was telling me how great their PBX/SIP/VOIP system was. I asked him “How do our customers call us if the mail server goes down?” I was answered by a very long silence… followed eventually by “Hmmm… never thought of that.” SMTP/POP/IMAP/Webmail… that is ALL I need thank you. So we looked at the expanding pool of products that were filling the void being left by CGP as it acsended to “Enterprise” status. We had narrowed the field to a small handful by last week.

Then we lost at Russian Roulette again.

At 4pm PST on October 31st, which is Midnight UTC, three of the 4 Communigate Pro servers at our facility exploded. Their timebombs went off and they all shut themselves down. My wife had to fill in for me as the Halloween driver (we live in a rural area, so I had planned on taking my son, and a few of his friends into town for trick-or-treating.) I spent the night hunched over my keyboard and on my VOIP phone (thankfully we don’t use Communigate Pro for our VOIP needs!) to my office dealing with the crisis. Based on past events, we very quickly came to the conclusion that it was the infamous Communigate Pro Time Bomb, and not some other issue since it happened at precisely the same time on more than one server, and we were not the only ones it was happening to. (Stalker’s mailing list, which is viewable on the web also was exploding with angry customers.) To get us through the night we rolled the clocks back on the CGP servers, and restarted them. In the morning we started the work of figuring out how to deal with this. I emailed Stalker trying to find out why, when they had told us that THIS version was OK for us, that it still had timebombed. I posted, and replied to other’s postings on the CGP mailing list, but my account was in “moderated” mode, and the moderator was obviously not paying attention (easy to do as that is a significant weakness of the CGP LIST module.) Vladimir Butenko appeared on the list, once again in his twisted Russian logic saying essentially ‘there is no timebomb, and besides you must be stealing my software since your server stopped working.’ Not exactly a confidence or trust building exercise in customer relations there Vlad.

After careful reading of the CGP website, I finally decided that our only course of action was to downgrade to version 4.1.8, which seems to be the last of the “free upgrades” and should run on our license key obtained in 2000. Bill figured he could downgrade the software, and restart the CGP service without causing much disruption to our clients. 4.1.8 went on, we restarted, and suddenly, without warning…

The Starfish Returns!

Our mail server software is once again, moving at the speed of a quaalude-soaked starfish taking a leisurely creep over the ocean floor. It is 7 weeks early, but the starfish is back… with a vengeance!

Great. Just what we need. A software vendor extorting us on one side, and clients angry at us for under-performing software on the other. My loyalty is with my clients, not the bastard that is holding the gun to my head, or the timebomb on my server as the case may be. I rally the staff and roll out a plan; we’ll build a new server from scratch, install a fresh OS and a new install of CGP 4.1.8 on it, move the data over to it and cutover the IP address. Based on our past experience, this should outwit the Starfish!

Thankfully a customer had just decommissioned a very nice Dual CPU/Dual Core Intel server with a built-in Ultra-SCSI RAID system, and we made him an offer on it that he accepted. The only problem with it was the drives inside were low-capacity. Thankfully we have stacks of Sun Stor-Edge Array’s in our backup system that were in an idle state, so we ripped out 6 36GB LVD Ultra-SCSI drives from one and packed them in the server, installed FreeBSD on it, and started rsync on a cross-over cable between it and our production mail server. Oddly enough this went pretty fast, despite CGP in “Starfish Mode” the OS and filesystem is thankfully quite responsive. System load went from 0.10 to 0.34 on the production server while we were syncing… while talking to the Starfish was unbearably slow. For example CGP’s web UI would take 15 minutes to click from page to page.

We cutover to the fresh box at around midnight on Tuesday/Wednesday, and things seemed ‘OK’… instead of talking to a starfish, it felt like talking to a sleepy dog. Movement was perceptible, but not exactly as swift as we had hoped. In past experience “starfish mode” would improve to reasonable performance in the wee hours of the night when the server was under lesser mail load. Since I was staying in my office and had nothing else to do, I vented about this situation to my online friends, discussed via phone with Russ Pagenkopf, the guy I run the Mac-Mgrs list with… ironically running on a Stalker-donated copy of CGP, which also quite ironically had also timebombed! Russ & I decided to cease running CGP on the Mac-Mgrs list server as soon as possible, and once he had it running again I posted to the list about that. I also answered people who were angry on the CGP list about what was going on with us, and some of them relayed to that list what I had said, both to them, and on Mac-Mgrs. The PR backlash at Stalker/CGS was gaining momentum. I think I managed to get about 3 hours of sleep that night.

Sure enough come Wednesday morning east-coast business hours our main server was back to moving like a starfish. I left my staff to handle the angry clients, while I swallowed my anger and called Stalker/CGS for tech support. I didn’t expect much, but luck was on my side and by chance a Director-level employee answered the phone (When our tech support queue gets busy, I pick up the phone too!) I explained our situation with CGP 4.1.8 doing this “glacial slowdown” thing (I haven’t called it “starfish mode” with anyone at Stalker/CGS to date.) I asked him if my long-time contact was there, and he said, “yes, he just walked into the office” so I said to catch up with him since he knew the full history of this almost 5 year old problem and I didn’t have the energy to relate it to him. After a few hours of troubleshooting (it took me 55 minutes just to get to the UI to change a password so Stalker support could access the server) I got a call from them. Three people, all director-level folks at Stalker/CGS were on the phone and making me an offer. They would give us a 90-day License for CGP 4.3.9 to let us load that one up and see if it would fix the “Starfish Mode” bug. I was too exhausted to say anything but “it is worth a try”…. They promised me quotes for extending the 90-day license within a day.

License keys in hand, I woke up Bill, our over-worked and underslept senior sysadmin and had him install the 4.3.9 version on our creeping starfish of a server and restart…. it seemed OK for about 30 secdonds, then immediately tailspun back down to starfish mode once again. It is obvious whatever this bug is, it has never been adequately addressed by Stalker’s coders and remains embedded deep within the current version, and probably in upcoming ones as well. The Stalker support guys were stumped, and fell back into random-mode troubleshooting again, suggesting courses of action which were either impossible due to not being able to perfrom them on such a slow moving system, or stuff they had suggested in the past – which we knew would not work.

I had a plan. It was a total “hail mary” play, but similar stunts had worked for us in the past with the Starfish. Nuke the box we had been running the mailserver on just days before… before the software timebomb exploded. Fresh install of this CGP upgrade, move the data over to it and cutover again. This may sound like what we just tried, and it does. Meanwhile I talked to MY director level guy and said, where ever we are with the proposed new mail system roll-out, hit the gas pedal and get ready to install and ramp it up ASAP! He brought me PO’s for gear and software, and I signed them. I wrote an apology to our clients about the situation, and posted it to our website. I grabbed my laptop and left my office for the first time in almost three days to get some fresh air, and food. I had the laptop as it seems that open wireless networks are everywhere now, so if they needed me at the office I could probably get on AIM or whatnot easily.

Bill finished the install and rsync work, and we cut over to the “old” mailserver around 5 PM PST on Wednesday and….

It worked. The starfish was back in hibernation once again, and the server was behaving “normally.”

I finished up some client communications, and basically passed out on my office couch a few hours later. I slept 12 hours straight.

So, at the moment I have 90 days to get a better mail system rolled out and running. I think we can get that done. We’ll probably build a fresh, old CGP 4.1.8 system to leave any clients that can’t/won’t move to the new system, so we’ll stay in compliance with Stalker/CGS’ looney license scheme, and perpetually avoid the Russian Roulette with Software Timebombs present in CGP 4.2.X and who knows what subsequent versions. We’ll probably NEVER get a satisfactory answer about the causes, or real cures for Communigate Pro’s “Starfish Mode”… but here is my hope:

Someday, it will return. Not to *our* server, but to one of these “Enterprise Customers” that Stalker/CGS so desperately wants to trade their current customers for. Some multi-million dollar CGP “Messaging Platform” cluster installations. They’ll have hundreds of thousands of dollars invested in hardware, and of course CGP software. Their mighty cluster will slow to an inexplicable crawl. They’ll spend massive amounts of time, and eventually money, trying to cure it. Vladimir will log into it and tell them “Put a faster filesystem under it”, so they’ll blow wads and wads of cash at exotic SAN architectures or the like. VP-level guys like me will lose sleep and in-the-trenches guys will loose even more trying to fix the problem of wrestling with a starfish. Then, some geek in the organization will be google-surfing phrases like “CGP slow” or “glacial communigate” and stumble upon this blog entry from who knows how many years past. He’ll pass it up the chain, and somebody will gather up the guts to call me. I chuckle and say “You spent HOW much money to buy this software from these idiots? What, are you NUTS?”

There, I just saved you the phone call.

Keychain Access Hurdle Cleared!

I have a quote in my email .sig file, and it is likely in this blog’s “random quote” database as well (just keep hitting “refresh”) from one of my staff. It goes like this:

There’s only so much stupidity you can compensate for;
there comes a point where you compensate for so much
stupidity that it starts to cause problems for the
people who actually think in a normal way.

-Bill, digital.forest tech support

“Bill” in this case is Bill Dickson, my highly valued, and true treasure of a sysadmin. You can see his blog (WRD) listed in my blogroll, though I swear he is going for a world’s record for NOT updating his blog. He’s close to a year now.

Anyway, that rather insightful quote sums up what is going on with me and my keychain. I vented on a couple of mailing lists and was informed (by some well-informed people both inside and out of Apple) that Apple originally designed the Keychain system to work just as I was using it. With it being independent of the login password and flexible enough to allow people to use multiple keychains however they wished.

The user community apparently bitched and complained a LOT to Apple that they didn’t like the fact that they were independent of each other and that a change of the login password SHOULD also change the keychain password. I guess the majority of MacOS X users DO keep their login and keychain passwords the same. Me? I think that is stupid. I guess a lot of software engineers inside Apple thought it is stupid too. It became something of a fight between engineering and marketing (isn’t it always?) and engineering finally lost with 10.4.

So Apple caved and compensated for their customer’s stupidity, and ended up burning not-stupid people like me in the process.

Oh well. If you are a software engineer at Apple, the phrase “Asshat at Apple” I used in yesterday’s rant was NOT directed at you. But feel free to assume it refers to the people who forced you to make that change in the default behavior.

Speaking of forcing… I got my password back. It seems the “please select a longer password” dialog box is a placebo. If you just keep force-feeding the Keychain Access utility your unapproved password, it will accept it. Go figure.

I’m happy as a clam.