Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is an adequate philosophy for like.. a CRUD app, some freemium SaaS, social media, etc. Stuff with millions of users and billions of sessions, etc.

However there are industries applying these lessons in HPC / data analytics / things that touch money live .. operating on scales of users in the 10s to maybe 100s. So stuff where downtime is far more costly both in dollars and reputation.

I'm also intrigued by the constant cloud refrain of "stuff crashes all the time so just expect it to" coming from a background where I have apps that run without crash for 6 months at a time, or essentially until the next release.

I'm all for scaling, recovery, etc.. I just fail to understand why it is desirable for this to be an OR rather than an AND.

What if stuff was highly recoverable and scalable but also.. we just didn't run out of disk needlessly?



> I'm also intrigued by the constant cloud refrain of "stuff crashes all the time so just expect it to" coming from a background where I have apps that run without crash for 6 months at a time, or essentially until the next release.

IMHO, those aren't mutually exclusive. Your app code should be robust enough to run 6+ months at at time, and the "stuff crashes all the time so just expect it to" attitude should be reserved for stuff outside your control, like hardware failures.


Right, which is why I think brushing aside actually monitoring basic hardware stats that are leading indicators of error rates / API issues / etc makes no sense.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: