Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The article actually discredits its own conclusion early on:

> These outages didn’t happen because developers didn’t test software.

The conclusion being:

> How do you get quality code?...Don’t skimp on static code analysis and functional tests, which should be run as new code is written.

But even working from the conclusion backwards, which is "specs+code analysis" will save you from the big scary thing of "software erosion" and "complexity" thusly sparing us all from outages, I disagree.

Specs+analysis are helpful, but they do not magically solve complexity at scale. Crowdstrike sure, would've benefited from testing I agree but so many other large outtages need more than that, which is the disconnect of the article for me.

At some point you need blackbox, chaos monkey level production tests. Bring down your central database, bring down us-east-1. What happens to the business?

I'm not sure if this is valid, but a lot of the savvier tech companies' outtages feel like they're router configurations that lead to cascading traffic issues. But I have no data to back this thought up.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: