Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, your explanation seems completely at odds with what is currently the top-voted comment in this discussion. The linked discussion of the patch he built to fix the problem indicates that the dispatcher does just do CPU feature detection. Here is the URL for reference:

http://www.swallowtail.org/naughty-intel.shtml

According to that, the code simply does a feature check for SSE, SSE2, and SSE3. Except it also does a check for "GenuineIntel" and treats its absence as "no SSE of any kind" even if the CPU otherwise indicates that it does SSE. That check is completely unnecessary and does nothing but slow (or crash!) the code on non-Intel CPUs.

If you still think that's wrong, could you post the relevant code to show it?



That link doesn't actually show what it does to determine whether to use SSE or SSE2 (much less SSE3 and beyond). That it derives a boolean value is not the same as feature detection.

Further the bulk of that entry was from 2004, which is pertinent given that at the time the new Pentium 4 was the first Intel processor with SSE2, and the SSE implementation on the Pentium III was somewhat of a disaster -- both single-precision width (it simulated 128-bits through two 64-bit operations, and for the P3 compilers could optimize for its specific handicap), and sharing resources with the floating point unit. So the feature flag, coupled with "GenuineIntel", was all they needed to know for the two possible Intel variants with support.

Since then the dispatcher and options have grown dramatically more complex as the number of architectures and permutations have exploded.


Well, here's a complete analysis of the function:

http://publicclu2.blogspot.com/2013/05/analysis-of-intel-com...

Unfortunately, it doesn't show the raw assembly. But in the absence of any information to the contrary, I'm perfectly happy to trust this pseudocode. It shows a bunch of feature checks, preceded by a single "GenuineIntel" check. The code that's gated on "GenuineIntel" would work just fine on non-Intel CPUs. It might sometimes produce sub-optimal results, but overall it'll be fine. There are some CPU family checks, but my understanding is that non-Intel CPUs return the same values that Intel CPUs do for similar architectures/capabilities.

We have multiple people saying that the code runs faster if the "GenuineIntel" checks are removed, we have pseudocode for the function in question that shows a bunch of feature detection with a bit of CPU family detection, neither of which are at all Intel specific. And then we have you, who can't seem to substantiate your claims at all.

If you have actual code or other reasonable evidence to support what you're saying, I'd love to see it. But right now, I'm not buying it.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: