addc makes all your adds serialize on the flags register, which is really painfu...

haberman · 2024-07-17T00:15:40 1721175340

Is this really true? I would intuitively expect that register renaming would apply to eflags too, so that reads from flags don't truly need to be serialized despite nominally writing a bunch of things to the same register.

EDIT: this paper (linked in another comment) seems to indicate that this is possible:

> An out-of-order machine can look ahead and process the accumulation pass in parallel with the partial sum pass using a renamed eFlags register.

https://web.archive.org/web/20150131061304/http://www.intel....

phire · 2024-07-17T17:36:48 1721237808

EFLAGS is actually put in the same renaming register as the result, so you get renaming for free.

The renaming registers in the Intel Pentium Pro and Pentium II are actually over 80 bits wide. They need to hold a full 80bit float, or 64bit MMX result. The Pentium III extends this to 128bit wide renaming registers to support SSE.

This is despite the fact that the P6 architecture only had 32bit bit integer registers until the Core 2 in 2006. So there is plenty of room to store EFLAGS in the same renaming register as the result. This also means that the branch uops point to the result of the most recent flag modifying instruction.

It was only with Sandybridge (and the introduction of AVX) that the P6 switched to a PRF design, with separate registers for floats and integers. Of course, Netburst also had a PRF design.

variadix · 2024-07-17T17:59:13 1721239153

Yes, it’s why https://en.m.wikipedia.org/wiki/Intel_ADX exists

JonChesterfield · 2024-07-16T22:26:36 1721168796

I remembered and ultimately found a source for a workaround for the serialising on flags problem, intel paper at https://web.archive.org/web/20150131061304/http://www.intel.... amounts to new instructions with better behaviour for ILP