Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My Retro-socket3-toy-unit with a P83@100 MHz is fast (40MHz bus), but the DX5x86@150mhz (50MHz bus) beats everything except in quake. Timings aren't that great for the memory but the synchronous PCI bus (and working PCI VGA card at that speed) beats everything you throw at it.


Same here - all DOS games was faster on DX4, but Pentium was so smooth on Quake.


Carmack and Abrash optimized the heck out of Quake making use of a “quirk” in the Pentium where you can overlap integer operations and floating point DIV to eke out every last drop of performance.

Unfortunately the trick doesn’t work on early AMD K5/6 nor Cyrix CPUs - although Cyrix CPUs probably had other more serious problems.

https://youtu.be/DWVhIvZlytc?si=DBE9zpTLj_16-GAt


Yeah. Their perspective correct texture mapping did a perspective div only every 16 pixels drawn – in itself a considerable optimization with almost imperceptible loss in quality – but the big deal was that on the Pentium specifically, the integer pipeline could be almost perfectly cycle-optimized to process sixteen pixels in the time the FPU executed the next perspective division in parallel so the result was ready just as it was needed!


More on this in the Black Book graphic shttps://news.ycombinator.com/item?id=35738709 . Programming on Pentium is towards the end.


K5/Cyrix could overlap Integer and FPU operations, what they couldnt do was interleave (pipeline) FPU operations so that multiple floating point instructions ran in parallel.

https://www.phatcode.net/res/224/files/html/ch63/63-02.html

Here a non perspective correction related Quake FPU code example https://github.com/id-Software/Quake/blob/bf4ac424ce754894ac...

    Lcliploop:
     fld ds:dword ptr[0+0+esi]
     fmul ds:dword ptr[0+0+ebx]
     fld ds:dword ptr[0+4+esi]
     fmul ds:dword ptr[0+4+ebx]
     fld ds:dword ptr[0+8+esi]
     fmul ds:dword ptr[0+8+ebx]
     fxch st(1)
     faddp st(2),st(0)
     fld ds:dword ptr[0+0+edx]
     fmul ds:dword ptr[0+0+ebx]
     fld ds:dword ptr[0+4+edx]
     fmul ds:dword ptr[0+4+ebx]
     fld ds:dword ptr[0+8+edx]
     fmul ds:dword ptr[0+8+ebx]
     fxch st(1)
     faddp st(2),st(0)
     fxch st(3)
     faddp st(2),st(0)
     faddp st(2),st(0)
     fsub ds:dword ptr[12+ebx]
     fxch st(1)
     fsub ds:dword ptr[12+ebx]
     fxch st(1)
     fstp ds:dword ptr[Ld0]
     fstp ds:dword ptr[Ld1]

FXCH instruction is free (zero cycles) on Pentium for most instruction combinations, AMD caught up in late 1998 with CXT revision K6-2. http://www.azillionmonkeys.com/qed/cpuwar.html




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: