Machine failures are few and far between these days. Over the last four years I've had a cluster of perhaps 10 machines. Not a single hardware failure.
Loads of software issues, of course.
I know this is just an anecdote, but I'm pretty certain reliability has increased by one or two orders of magnitude since the 90s.
Also anecdotally, I’ve been running 12th gen Dells (over a decade old at this point) for several years. I’ve had some RAM sticks report ECC failures (reseat them), an HBA lose its mind and cause the ZFS pool to offline (reseat the HBA and its cables), and precisely one actual failure – a PSU. They’re redundant and hot-swappable, so I bought a new one and fixed it.
It is in that if something happens less often, you don't need to prepare for it as much if the severity stays the same (cue in Nassim Taleb entering the conversation).
I'm not sure what types of products you work on, but it's kind of rare at most companies I've worked at where having a backup like that is a workable solution.
Loads of software issues, of course.
I know this is just an anecdote, but I'm pretty certain reliability has increased by one or two orders of magnitude since the 90s.