Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

K5/Cyrix could overlap Integer and FPU operations, what they couldnt do was interleave (pipeline) FPU operations so that multiple floating point instructions ran in parallel.

https://www.phatcode.net/res/224/files/html/ch63/63-02.html

Here a non perspective correction related Quake FPU code example https://github.com/id-Software/Quake/blob/bf4ac424ce754894ac...

    Lcliploop:
     fld ds:dword ptr[0+0+esi]
     fmul ds:dword ptr[0+0+ebx]
     fld ds:dword ptr[0+4+esi]
     fmul ds:dword ptr[0+4+ebx]
     fld ds:dword ptr[0+8+esi]
     fmul ds:dword ptr[0+8+ebx]
     fxch st(1)
     faddp st(2),st(0)
     fld ds:dword ptr[0+0+edx]
     fmul ds:dword ptr[0+0+ebx]
     fld ds:dword ptr[0+4+edx]
     fmul ds:dword ptr[0+4+ebx]
     fld ds:dword ptr[0+8+edx]
     fmul ds:dword ptr[0+8+ebx]
     fxch st(1)
     faddp st(2),st(0)
     fxch st(3)
     faddp st(2),st(0)
     faddp st(2),st(0)
     fsub ds:dword ptr[12+ebx]
     fxch st(1)
     fsub ds:dword ptr[12+ebx]
     fxch st(1)
     fstp ds:dword ptr[Ld0]
     fstp ds:dword ptr[Ld1]

FXCH instruction is free (zero cycles) on Pentium for most instruction combinations, AMD caught up in late 1998 with CXT revision K6-2. http://www.azillionmonkeys.com/qed/cpuwar.html


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: