But it happens. Our colo went down. Fire system triggered it (no actual fire was...

rdl · on April 5, 2013

Yeah. So estimate the cost there are like $100-250k? I'm willing to accept a pretty low risk to my life for ~15 minutes of searching to save my company $250k. It's a risk on the order of riding a motorcycle from Oakland to San Jose in rush hour, I'd roughly estimate.

yardie · on April 5, 2013

"I can totally reach into the back of this gigantic, flesh eating machine to move that widget a little bit to the right. Management might give me a raise for saving them money!" -famous last words of a former factory worker.

rdl · on April 5, 2013

"Do not look into laser with remaining eye" is the classic, though.

reeses · on April 5, 2013

Enjoy that cancer from burning and/or airborne plastic/arsenic/mystery material in 20 years.

And which part of Oakland? That can be a pretty broad range of risk. :-)

rdl · on April 5, 2013

"You wouldn't download a new lung, would you?" (RIAA ad in 2033). I lived in a tent 200m downwind of a 24x7 burning trash dump on a former Iraqi and then USAF military base, for some time, so I think I've got particulate risk checked already.

reeses · on April 5, 2013

Ugh, yeah, I'd say you're, umm, covered.

Edit: It helps that you're the CEO of your company. The risk profile changes a little bit. :-)

ttrreeww · on April 5, 2013

My condolences... I highly recommend eating healthy, otherwise your odd of getting cancer in the 40-50s is more than 30%.

rdl · on April 5, 2013

I think I actually hope my odds of eventually getting cancer are ~100% over my lifespan, because it seems to be a natural consequence of living long enough. I also hope that by the time I have cancer of any size, it is something you can treat fairly successfully.

ttrreeww · on April 8, 2013

As long as they treat cancer as a profit center, a cure will never surface in the US of A. "Treament" is a multi billion (trillion?) business, a cure would reduce that to dust.

Also, you should look up the agony, people would rather shoot themselves than take the "treatment"

kevinpet · on April 5, 2013

Citation, please.

rdl · on April 6, 2013

I believe there was some talk of a lawsuit: http://www.armytimes.com/news/2008/12/military_kbr_lawsuit_1...

http://www.lawyersandsettlements.com/lawsuit/open-pit-burnin...

I don't think the situation was that bad. The one really unforgivable thing was shoddy electrical work in shower trailers (I think ~10 contractors and soldiers were fatally electrocuted while showering while in Iraq! I certainly got 230v a couple times and went through the reporting process, and actually got MPs and a friend from Contracting to turn it into a bigger issue.)

dredmorbius · on April 9, 2013

Electrocuted while showering: WTF!

mikegreco · on April 5, 2013

I do that more days than not and I haven't been scraped off the pavement yet. I think most commenters here are overly concerned with the risk because they haven't properly equipped themselves to deal with it. It's much easier to keep yourself out of a bodybag when you are aware of your surroundings.

hp50g · on April 5, 2013

Spot on. We were rudely awoken at 3AM by our alert system after one of our DC's caught fire (host Europe/123-reg in nottingham - utter fucking cowboys now moved on from there). UPS blew and took out the entire power system and generator.

It definitely happens.

dredmorbius · on April 5, 2013

The number of colo issues I've seen triggered by various backup/redundant systems is pretty impressive.

Whether it was a redundant mains power system blowing (taking down the main PDU), spoiled diesel, failed generator cutover, UPS fire, smoke detector-triggered shutdown (associated with power management), a really bizarre IPV6 ping / router switch flapping issue, load balancer failures based on an SSL cipher-implementation bug (triggered an LB reboot and ~15s outage at random intervals), etc., etc., etc.

Just piling redundancy on your stack doesn't make it more reliable. You've got to engineer it properly, understand the implications, and actually monitor and come to know the actual outcomes. Oh, and cascade failures.

Nick_C · on April 6, 2013

> Just piling redundancy on your stack doesn't make it more reliable.

Yeah, in a sense it actually makes it less reliable as far as mean-time-between-failures go. As an example, the rate of engine failure in twin-engine planes is greater than for single-engine planes. It's obvious if you think about it: there are now two points of failure instead of one. Why have two-engined planes? Because you can still fly on one engine (pilots: no nitpicking!).

What redundancy does do is let you recover from failure without catastrophe (provided you've set it up properly as per the parent).

dredmorbius · on April 6, 2013

> Yeah, in a sense it actually makes it less reliable as far as mean-time-between-failures go.

It depends on what you're protecting against, how you're protecting against it, and how you've deployed those defenses.

Chained defenses, generally, decrease reliability. Parallel defenses generally increase it.

E.g.: Putting a router, an LB, a caching proxy, an app server tier, and a database back-end tier (typical Web infrastructure) in series (a chain) introduces complexity and SPOFs to a service. You can duplicate elements at each stage of the infrastructure, but might well consider a multi-DC deployment, as you're still subject to DC-wide outages (I've encountered several of these) and a great deal of complexity and cost.

Going multi-DC doesn't increase capital requirements by much, and may or may not be more expensive than 2x the build-out in a single DC. It though raises issues of code and architecture complexity.

In several cases, we were experiencing issues that would have pervaded despite redundant equipment. E.g.: the load balancer SSL bug we encountered was present on all instances of multiple generations of the product. Providing two LBs would simply have insured that as the triggering cipher was requested, both LBs would have failed and rebooted. Something of an end-run around our Maginot line, as it were.