My Retro-socket3-toy-unit with a P83@100 MHz is fast (40MHz bus), but the DX5x86...

mobilio · on Nov 13, 2023

Same here - all DOS games was faster on DX4, but Pentium was so smooth on Quake.

TerrifiedMouse · on Nov 13, 2023

Carmack and Abrash optimized the heck out of Quake making use of a “quirk” in the Pentium where you can overlap integer operations and floating point DIV to eke out every last drop of performance.

Unfortunately the trick doesn’t work on early AMD K5/6 nor Cyrix CPUs - although Cyrix CPUs probably had other more serious problems.

https://youtu.be/DWVhIvZlytc?si=DBE9zpTLj_16-GAt

Sharlin · on Nov 13, 2023

Yeah. Their perspective correct texture mapping did a perspective div only every 16 pixels drawn – in itself a considerable optimization with almost imperceptible loss in quality – but the big deal was that on the Pentium specifically, the integer pipeline could be almost perfectly cycle-optimized to process sixteen pixels in the time the FPU executed the next perspective division in parallel so the result was ready just as it was needed!

touisteur · on Nov 13, 2023

More on this in the Black Book graphic shttps://news.ycombinator.com/item?id=35738709 . Programming on Pentium is towards the end.

rasz · on Nov 14, 2023

K5/Cyrix could overlap Integer and FPU operations, what they couldnt do was interleave (pipeline) FPU operations so that multiple floating point instructions ran in parallel.

https://www.phatcode.net/res/224/files/html/ch63/63-02.html

Here a non perspective correction related Quake FPU code example https://github.com/id-Software/Quake/blob/bf4ac424ce754894ac...

    Lcliploop:
     fld ds:dword ptr[0+0+esi]
     fmul ds:dword ptr[0+0+ebx]
     fld ds:dword ptr[0+4+esi]
     fmul ds:dword ptr[0+4+ebx]
     fld ds:dword ptr[0+8+esi]
     fmul ds:dword ptr[0+8+ebx]
     fxch st(1)
     faddp st(2),st(0)
     fld ds:dword ptr[0+0+edx]
     fmul ds:dword ptr[0+0+ebx]
     fld ds:dword ptr[0+4+edx]
     fmul ds:dword ptr[0+4+ebx]
     fld ds:dword ptr[0+8+edx]
     fmul ds:dword ptr[0+8+ebx]
     fxch st(1)
     faddp st(2),st(0)
     fxch st(3)
     faddp st(2),st(0)
     faddp st(2),st(0)
     fsub ds:dword ptr[12+ebx]
     fxch st(1)
     fsub ds:dword ptr[12+ebx]
     fxch st(1)
     fstp ds:dword ptr[Ld0]
     fstp ds:dword ptr[Ld1]

FXCH instruction is free (zero cycles) on Pentium for most instruction combinations, AMD caught up in late 1998 with CXT revision K6-2. http://www.azillionmonkeys.com/qed/cpuwar.html