Tag Archives: high availability

A Pragmatic View of Downtime Cost

by Scott Kantner, July 17th, 2009 in Data Center, Disaster Recovery

On occasion, our prospective customers will mention they’ve done an extensive study to determine how much a minute of downtime costs their company. Ergo they are visiting with us to establish either a primary or secondary location as part of strategy to lose exactly $0. Sometimes I wonder what Gary Coleman would say if they heard their explanations.

Well said Gary. Terse, but adequate.

There is apparently some very deep magic involved in figuring out the cost of downtime, and no one seems to agree exactly on what the proper incantation should be. A little over a year ago on The Numbers Guy, there was a humorous post of just how ambiguous, and ultimately irrelevant, calculating this number can become.  To wit:

One blog headlined a post, “Amazon’s $3.6 Million Outage?,” noting that if projected second-quarter revenue was spread evenly over time, then the site normally would be making $1.8 million per hour. TechSpot.com and the Seattle Post-Intelligencer performed similar calculations with last year’s revenue to estimate that Amazon lost $29,000 per minute; CNET used last quarter’s results to calculate $31,000 per minute. Then the New York Times, last week, reported that “Amazon, by some estimates, lost more than a million dollars an hour in sales.”

Does it really matter who was right?  It’s A Lot Of Money by anyone’s reckoning, that is, if you can believe the numbers in the first place.

What exactly is the value of doing an extensive study, or even a moderately detailed investigation, given the cost the of meetings hours one would burn doing the analysis vs. the quality of data one could actually expect to produce? Often the variables can become so complex that gut feel and opinion invariably creep into the equation just to get the math done. This in turn results in a baked-in degree of subjectivity that ends up being the source of debate when the numbers are used to justify a business case later on.

Maybe I’m missing something, but there has to be a better way. Usually the only reason we want to know the cost of downtime is to justify the costs necessary to keep the key parts of the infrastructure highly available. It then logically follows that we need to know which parts of our infrastructure really contribute to the top line such that they are truly worthy of being made highly available. With that in mind, let’s ask the questions:

Let’s say gross revenue was $20M last year, and we do business 5 days a week, or 260 days/year. On a simple even spread, that’s $53/minute, or if you like $3,205/hour. Can you make a business decision based on that? No matter how you rig the math (e.g. more heavily weight end of month, etc), it boils down to crazy numbers that look this. How can they possibly help you justify a monthly spend on an availability solution? Does it therefore matter precisely how accurate they are? The above calculation is admittedly simple and perhaps even lame, but I would argue than any other more exotic formula does no better.

I think the more useful exercise to take the time to really understand what your key business systems are (not the discrete elements like servers and routers), and then determine what underlying systems are required to make them go, along with all of the various inter-system dependencies. As obvious as that sounds, our experience has been that folks often do not have this kind of a handle on their infrastructure. Instead of saying to senior management that “our SQL server absolutely has to be up all the time because the business depends on it,” you should be able to say “our SQL server needs to be up because we can’t take orders if it’s down,” or “our SQL server needs to be highly available because we can’t load our trucks when it’s down.” You are much more likely to hear “make it so” on your DR proposal with this approach than if you go in with a story about how much downtime costs the company. Senior management is well aware of the revenue numbers – they don’t need to be reminded, and trying to foist a murky cost of downtime justification on them is an iffy, if not perilous strategy. What they want and need is plain talk on what happens if things break.

Speaking in pragmatic terms the business understands will result in funding for a DR plan that makes sense for the business, though it may not be everything you’d personally like to have. So if you feel you really need synchronous replication rather than asynchronous, you’re going to have to explain in business terms why the extreme extra cost is necessary. Pitch the solution in plain terms. Lay out information meaningful to your leaders and trust them to make a good business decision.

But do influence them to go with a data center that has your operating budget in mind. We can often bring some of the seemingly out-of-reach high availability solutions within your reach.

//spk