> why they decided to make services depend on just one data center In my experie...

throwaway6920 · on Nov 4, 2023

> In my experience, no engineers really decided to make services depend on just one data center.

Partially true in this case; I can't speak to modern CF (or won't, moreso) but a large amount of internal services were built around SQL db's, and weren't built with any sense of eventual consistency. Usage of read replicas was basically unheard of. Knowing that, and that this was normal, it's a cultural issue rather than an "oops" issue.

Flipping the whole DC data sources is a sign of what I'm describing; FAANG would instead be running services in multiple DC's rather than relying on primary/secondary architecture.

disgruntledphd2 · on Nov 4, 2023

Dunno about that, I've read similar internal postmortems at the FAANG I worked at.

daxfohl · on Nov 4, 2023

Everywhere I've worked requires a DR drill per service, but I've never seen anything where the whole company shuts down a DC at once across all services.

But probably we should. It's an immensely larger coordination problem, but frankly, it's probably the more common failure mode.

disgruntledphd2 · on Nov 5, 2023

The FAANG I worked at did this back in 2016-18, so that what happened to CloudFlare didn't happen to them.