MapReduce came along at a moment in time where going horizontal was -essential-. Storage had kept increasing faster than CPU and memory, and CPUs in the aughts encountered two significant hitches: the 32-bit to 64-bit transition and the multicore transition. As always, software lagged these hardware transitions; you could put 8 or 16GB of RAM in a server, but good luck getting Java to use it. So there was a period of several years where the ceiling on vertical scalability was both quite low and absurdly expensive. Meanwhile, hard drives and the internet got big.
For the Map Reduce specifically the one of the big issues was the speed at which you could read data from a HDD and transfer across the network. The MapReduce paper benchmarks were done with computers with 160 GB HDDs (so 3x smaller than typical NVMe SSD today) which had sequential read of maybe 40MB/s (100x smaller than a NVMe Drive today!) and random reads of <1MB/s (also very much smaller than a NVMe Drive today).
On the other hand they had 2GHz Xeon CPUs!
Table 1 in the paper suggests that average read throughput per worker for typical jobs was around 1MB/s.
Maybe more like 80MB/s? But yeah, good point, sequential reads were many times faster than random access, yet on a single disk the sequential transfer rate increases were still not keeping up with storage rate increases, nor CPU speed increases. MapReduce/Hadoop gave you a way to have lots of disks operating sequentially.
I’ll add context that NUMA machines with high CPU’s and RAM used to cost six to seven digits. Some setups were eight figures. They had proprietary software, too.
People came up with horizontal scaling across COTS hardware, often called Beowulf clusters, to have more computing for cheaper. They’d run UNIX or Linux with a growing collection of open-source tools. They’d be able to get the most out of their compute by customizing it to their needs.
So, vertical scaling being exorbitantly expensive and less flexible at the time, too.