https://www.phatcode.net/res/224/files/html/ch63/63-02.html
Here a non perspective correction related Quake FPU code example https://github.com/id-Software/Quake/blob/bf4ac424ce754894ac...
Lcliploop: fld ds:dword ptr[0+0+esi] fmul ds:dword ptr[0+0+ebx] fld ds:dword ptr[0+4+esi] fmul ds:dword ptr[0+4+ebx] fld ds:dword ptr[0+8+esi] fmul ds:dword ptr[0+8+ebx] fxch st(1) faddp st(2),st(0) fld ds:dword ptr[0+0+edx] fmul ds:dword ptr[0+0+ebx] fld ds:dword ptr[0+4+edx] fmul ds:dword ptr[0+4+ebx] fld ds:dword ptr[0+8+edx] fmul ds:dword ptr[0+8+ebx] fxch st(1) faddp st(2),st(0) fxch st(3) faddp st(2),st(0) faddp st(2),st(0) fsub ds:dword ptr[12+ebx] fxch st(1) fsub ds:dword ptr[12+ebx] fxch st(1) fstp ds:dword ptr[Ld0] fstp ds:dword ptr[Ld1]
https://www.phatcode.net/res/224/files/html/ch63/63-02.html
Here a non perspective correction related Quake FPU code example https://github.com/id-Software/Quake/blob/bf4ac424ce754894ac...
FXCH instruction is free (zero cycles) on Pentium for most instruction combinations, AMD caught up in late 1998 with CXT revision K6-2. http://www.azillionmonkeys.com/qed/cpuwar.html