My Retro-socket3-toy-unit with a P83@100 MHz is fast (40MHz bus), but the DX5x86@150mhz (50MHz bus) beats everything except in quake. Timings aren't that great for the memory but the synchronous PCI bus (and working PCI VGA card at that speed) beats everything you throw at it.
Carmack and Abrash optimized the heck out of Quake making use of a “quirk” in the Pentium where you can overlap integer operations and floating point DIV to eke out every last drop of performance.
Unfortunately the trick doesn’t work on early AMD K5/6 nor Cyrix CPUs - although Cyrix CPUs probably had other more serious problems.
Yeah. Their perspective correct texture mapping did a perspective div only every 16 pixels drawn – in itself a considerable optimization with almost imperceptible loss in quality – but the big deal was that on the Pentium specifically, the integer pipeline could be almost perfectly cycle-optimized to process sixteen pixels in the time the FPU executed the next perspective division in parallel so the result was ready just as it was needed!
K5/Cyrix could overlap Integer and FPU operations, what they couldnt do was interleave (pipeline) FPU operations so that multiple floating point instructions ran in parallel.