I'm certainly not trolling, but unfortunately I think you've completely misunder...

I'm certainly not trolling, but unfortunately I think you've completely misunderstood the context of this entire discussion.

If you really have multi-petabyte datasets then probably you are at the scale where distributed storage and systems will be superior.

The point of this conversation is that most people are not at this scale but think they are. IE: they sincerely believe that a dataset does not fit in ram of a single box because it's 1TiB or they think because it doesn't sit on a single 16TiB drive then a distributed system is the only solution.

The original post is an argument about that; that a single node can outcompete a large cluster, so you should avoid clustering until it really cannot fit on a single box anymore.

Your addendum was reliability is a large factor. Mostly this does not bear resemblance with reality. You might be surprised to learn that reliability follows a curve where you get very close to high reliability with a single machine, you diminish it enormously with a distributed system and then start approaching higher reliability when you have a lot more effort into your distributed system.

My comment about RAID was simply because it's very obvious that a single drive failure should not be taking a single machine down, similarly a CPU fault or memory fault can also be configured to not take down a machine. That you didn't understand this was either a failing of our industry knowledge; or, if you did understand this then the comment was disingenuous and intentionally misleading- which is worse.

I've also only worked at companies that were "always on" but that's less true than you think also.

I have never worked anywhere that insisted that all machines are on all the time, which is really what you're arguing. There is no reason to have a processing box turned on when there's no processing that's required.

Storage and aggregation: sure, those are live systems and should be treated as such, but it is never a single system that both ingests and processes. Sometimes they have the same backing store, but usually there is an ETL process and that ETL process is elastic, bursty, etc. and its outputs are what people are actually doing reports based on.