Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’m so confused, isn’t this just like basic highly available infrastructure mixed with a toxic SRE culture?

I want to serve 5TB!

Okay grab two instances in different patching zones, create a bucket in our replicated RADOS storage that can hold your data or create a table/db in our Postgres cluster, write your app with tests, add an entry in to the load balancer, add an entry in our big ole distributed job scheduler if you need cron, and submit a PR against the infra repo to add Prometheus metrics and alerts.

And when your done with that set up CI/CD because you shouldn’t assume that instances are reliable and if you don’t give us the code to do a deploy we can’t recreate your app when the VM goes belly up and we’ll have to page you.

Are people not used to what it really takes to “just run some code?”



People with HA production experience can easily vibe with points made by Broccoli Man. Yes, these things make a lot of sense if you actually want to get code running reliably, especially at scale (organizational and userbase).

But we must not forget how this can look from the point of view of someone who hasn't had to deal with a page due to an entire datacenter going offline, who's not aware of all the hundreds of small things that can go wrong by doing the 'obvious' thing. I think the video is more of a way to poke fun at the optics of this (and some of the overly arcane stuff involved), rather than at the idea of high availability being useless. At least that's how I've always felt about it, a way to remind SREs to respect their internal users (simplify! automate! document!) and that what makes sense to them might look ridiculous to others.


I am used to it, but

1. It is rare for the details of how to actually accomplish each of those steps to be both documented and the documentation made accessible.

2. If you can describe it that succinctly, it really ought to be automated. If it can't be automated... then you left something out of your instructions, which goes back to point (1).


Like the steps to do all of this are automated, but we can't read your mind. All of this is basically boils down to submit a PR against some repo that says "there shall be two instances in these regions, there shall be a database in this cluster, there shall be a bucket with this name, etc etc" that the SRE team reviews and merges, which triggers an infra deploy.


It totally makes sense for Gmail, but at Google "serve 5TB" means something like sort your manager's inbox, something that someone somewhere has an interest in doing, or trying, but of no real consequence for failure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: