Hacker News new | past | comments | ask | show | jobs | submit login

Failing over is correct because there's no way to discern that the hardware is not at fault. They should have designed a better response to the second failure to avoid the knock-on effects.



I don't think anything in this incident pointed to a hardware fault

The software raised an exception because a "// TODO: this should never happen" case happened

A hardware fault would look like machines not talking to each other or corrupted data file unreadable


Retroactive inspection revealed that it wasn't a hardware failure, but the computer didn't know that at the time, and hardware failure can look like anything, so it was correct to exercise its only option.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: