Hacker News new | past | comments | ask | show | jobs | submit login

addc makes all your adds serialize on the flags register, which is really painful.

more modern approach is to reserve some bits out of every word for carries, but that drastically complicates things.




Is this really true? I would intuitively expect that register renaming would apply to eflags too, so that reads from flags don't truly need to be serialized despite nominally writing a bunch of things to the same register.

EDIT: this paper (linked in another comment) seems to indicate that this is possible:

> An out-of-order machine can look ahead and process the accumulation pass in parallel with the partial sum pass using a renamed eFlags register.

https://web.archive.org/web/20150131061304/http://www.intel....


EFLAGS is actually put in the same renaming register as the result, so you get renaming for free.

The renaming registers in the Intel Pentium Pro and Pentium II are actually over 80 bits wide. They need to hold a full 80bit float, or 64bit MMX result. The Pentium III extends this to 128bit wide renaming registers to support SSE.

This is despite the fact that the P6 architecture only had 32bit bit integer registers until the Core 2 in 2006. So there is plenty of room to store EFLAGS in the same renaming register as the result. This also means that the branch uops point to the result of the most recent flag modifying instruction.

It was only with Sandybridge (and the introduction of AVX) that the P6 switched to a PRF design, with separate registers for floats and integers. Of course, Netburst also had a PRF design.



I remembered and ultimately found a source for a workaround for the serialising on flags problem, intel paper at https://web.archive.org/web/20150131061304/http://www.intel.... amounts to new instructions with better behaviour for ILP




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: