Yeah. Their perspective correct texture mapping did a perspective div only every 16 pixels drawn – in itself a considerable optimization with almost imperceptible loss in quality – but the big deal was that on the Pentium specifically, the integer pipeline could be almost perfectly cycle-optimized to process sixteen pixels in the time the FPU executed the next perspective division in parallel so the result was ready just as it was needed!