Monthly Archives: June 2010

Keep The Change

by Scott Kantner, June 23rd, 2010 in Human Factors

Does this sound like your IT shop?  Reports from the Uptime Institute consistently show that the majority of reliability and uptime woes aren’t caused by hardware,  facilities, or utility failure – they’re caused by humans, and what pray tell are those humans doing?  They’re changing things, and often too much of the change isn’t planned, approved, or documented.  Or, there is simply too much change going on at one time.

Much like a bomb is meant to explode, technicians are meant to be technical, so it’s a bit unrealistic to assume they’re giving a lot of thought to managing change, much less be fond of doing so. They just want to git ‘er done, and in large part, we pay them well to not only do that, but to do it right the first time.  Hard core techies, the ones that really know how to make things work, typically aren’t also wired for sitting in management meetings. The problem with managing change is that it’s boring. It’s not technical. And explaining highly technical things to non-technical folks in a change management meeting is not always the average techie’s strong suite, nor perhaps the best use of their time. To the contrary, it can be a very frustrating experience for them, which can lead them down the Dark Side of making changes beneath the radar. Effective change management therefore becomes a bit of a balancing act. We need to know what’s going on, but we don’t want to bog everyone down in the process.

In our data center controlling change is not optional. Reliability demands it, as do the Spanish Inquisition SAS 70 auditors. But we’ve found a way to manage it without terribly burdening our technical staff. Change requests may be formally entered in the system by any authorized individual whether or not they are technical;  they are simply the person requesting the change. The request is then routed to a technician who can assess what needs to be done, adds those details to the request, makes a suggestion as to when it might be done, and then it’s passed on to someone in management who can assess the risk and approve/disapprove it. If a change is of major significance, the request comes before a Change Advisory Board (CAB) for final approval. Technicians, while welcome, are not required to attend CAB meetings.  When requests are properly documented, the CAB is almost always able to make a good decision without further involving the technical staff.  When the CAB does need more information or defers a  request for some reason (e.g. too many changes on one night), the technician in question is notified and it’s handled outside of a meeting.  This saves time, money, and mental fatigue. Since the pain threshold is relatively low, this method also encourages all change activity to actually be run through the proper channels.

Our process is capable of handling very high rates of change, but that doesn’t mean that we do so.  On the contrary, we try to minimize the rate of change, batching things together when it makes sense to  minimize outages, and spreading them out when the risk is high to maximize uptime.

Managing change is not fun, and you may be justifiably weary of it.  Let us take that burden off of your shoulders.