One of my favorites is a super-long one by a user named bane, about three solutions to speeding up computing: (1) "high" (more RAM, CPU), (2) "wide" (more machines), and (3) "deep" (refactoring), which is what he recommends first. More than once he has rewritten something that was running slowly even on the latest and greatest architectures (perhaps partially because of that), to running on a single machine, even an old personal computer. He reminds us that a modern computer, with its solid-state drives, gigabytes of memory, and multicore gigahertz processors, can take on many large problems, if you just stoop to give the problem a decent amount of attention first, https://news.ycombinator.com/item?id=8902739
Interesting it also involves graph processing, which Bane's comment did. The point is that distributed graph processing frameworks were mostly parallelizing their own overhead, not solving the problem, which could be done on a single machine!
Reminds me of this recent comment on Chuck Moore, of Forth fame, building his own technology stack right down the hardware and squeezing it for performance.
He does not claim to introduce the concepts of horizontal and vertical scaling. He mentions them only to immediately discourage them: "Lots of people make the mistake of thinking there's only two vectors you can go to improve performance, high or wide. [...] There's a third direction you can go, I call it 'going deep'." Like I said, it's a long post, so I tried my best to summarize it and then linked to it.