I’m so confused, isn’t this just like basic highly available infrastructure mixe...

q3k · on Nov 2, 2021

People with HA production experience can easily vibe with points made by Broccoli Man. Yes, these things make a lot of sense if you actually want to get code running reliably, especially at scale (organizational and userbase).

But we must not forget how this can look from the point of view of someone who hasn't had to deal with a page due to an entire datacenter going offline, who's not aware of all the hundreds of small things that can go wrong by doing the 'obvious' thing. I think the video is more of a way to poke fun at the optics of this (and some of the overly arcane stuff involved), rather than at the idea of high availability being useless. At least that's how I've always felt about it, a way to remind SREs to respect their internal users (simplify! automate! document!) and that what makes sense to them might look ridiculous to others.

gliese1337 · on Nov 2, 2021

I am used to it, but

1. It is rare for the details of how to actually accomplish each of those steps to be both documented and the documentation made accessible.

2. If you can describe it that succinctly, it really ought to be automated. If it can't be automated... then you left something out of your instructions, which goes back to point (1).

Spivak · on Nov 2, 2021

Like the steps to do all of this are automated, but we can't read your mind. All of this is basically boils down to submit a PR against some repo that says "there shall be two instances in these regions, there shall be a database in this cluster, there shall be a bucket with this name, etc etc" that the SRE team reviews and merges, which triggers an infra deploy.

svachalek · on Nov 2, 2021

It totally makes sense for Gmail, but at Google "serve 5TB" means something like sort your manager's inbox, something that someone somewhere has an interest in doing, or trying, but of no real consequence for failure.