Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In all fairness the rest of the article is about that


So why spend so much time trying to shift blame to the vendor? They could've just started the article with something like:

> Due to circumstances beyond our control the DC lost all power. We are still working with our vendors to investigate the cause. While such a failure should not have been possible, our systems are supposed to tolerate a complete loss of a DC.


I don't think I read it as charged as you did

Here's what happened, here's what went wrong, here's what we did wrong, here's our plans to avoid it happening again

Seems like a standard post mortem tbh


Because a small handful of decisions probably led to the Clickhouse and Kafka services still being non-redundant at the datacenter level, which added up to one mistake. But a small handful of mistakes were made by the vendor. Calling out each one of them was bound to take up more page space.

The ordering that they list the mistakes would be a fair point to make though, in my opinion. They hinted at a mistake they made in their summary, but don't actually tell us point blank what it was until they tell us all the mistakes that their vendor made. I'd argue that was either done to make us feel some empathy for Cloudflare as being victims of the vendor's mistakes, misleading us somewhat. Or it was done that way because it was genuinely embarrassing for the author to write and subconsciously they want us to feel some empathy for them anyway. Or some combination of the two. Either way, I'll grant that I would have preferred to hear what went wrong internally before hearing what went wrong externally.


Slightly less than half, and the bottom half, so that people just skimming over it will mostly remember the DC operators' problems, not Cloudflare's own. This is very deliberately manipulative.


It is of course possible they've shuffled things around since this was posted but it seems that the first part addresses their system failings.

5th paragraph to the 9th are Cloudflare's "we buggered up" before they get to the power segment. They then continue with the "this is our fault for not being fully HA" after the power bit.

Each to their own, I'm going to read it as a regular old post mortem on this one.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: