Intel's "cripple AMD" function (2009)

dchichkov · on Jan 20, 2014

I've ran into it in practice a few years back, while making WRF (math intensive atmospheric modeling software, I'm maintaining non-commercial soaring prediction site for the bay area) to work on my AMD cluster. Had to patch the executable compiled with the Intel compiler in order to make it work unhindered on AMD's. The patch was just zapping 'Genuine Intel' detection code in the compiled executable... That 'post-linker' patch is available here: http://www.swallowtail.org/naughty-intel.shtml

neurostimulant · on Jan 20, 2014

Why not just recompile wrf from source? Dependency hell?

aylons · on Jan 20, 2014

Probably it wouldn't work so nice with other compiler. This level of optimization often requires compiler - specific syntax.

Not to say Intel optimization is really good.

AmblingAvocado · on Jan 20, 2014

I was in a parallel programming class where the fastest correct assignments got significant extra credit boosts. Compiling with the intel compiler with optimizations turned on (vs gcc) was often enough to make the difference in getting that extra credit by a significant margin.

lilyball · on Jan 21, 2014

You were graded based on compiled binaries that you provided, and not based on the source? That sounds crazy to me.

AmblingAvocado · on Jan 21, 2014

We provided build scripts and source. The extra credit was for fastest execution time.

I assume you're payed based on final results, not based on source. Not so crazy of a concept - whoever delivered the best results got rewarded for it.

lilyball · on Jan 21, 2014

Sure, but if merely switching compilers produced a faster binary, then I would expect all programs to be compiled with the better-optimizing compiler. After all, it doesn't take any particular expertise to adjust the value of CC.

dpe82 · on Jan 21, 2014

There's more to optimization than setting -O3. Learning how various compilers behave and how their optimization features interact with your code are valuable skills and may well have been within the scope of the course. Certainly worthy of extra credit.

scott_karana · on Jan 21, 2014

Sure, but why not mandate that everyone tunes the same compiler?...

mikeash · on Jan 21, 2014

I'll flip the question around: why mandate it?

The class clearly has a performance component, and so students were expected to learn about optimization. Are they going to learn optimization better or worse if you mandate a single compiler? If merely switching compilers is the best path to performance, is that not a valuable lesson? If switching compilers and doing a bunch of extra work to make the code fast with the new compiler is the best path to performance, have they not learned a great deal?

scott_karana · on Jan 21, 2014

Some compilers aren't generally available. Hypothetically, what if ICC wasn't available freely to educational users, but some of the students had side-jobs where they used it?

You can always mandate a large set of compilers, make them all available, and leave it up to the students to determine which is fastest. I think that acheives both the competitive/educational goal and the level playing field goal.

mikeash · on Jan 21, 2014

I would definitely ban using any compiler that wasn't generally available to the class, or at least disqualify their output from winning the contest. I'd take a generic approach where it's worded just like that, rather than trying to come up with an official set of acceptable compilers, though.

scott_karana · on Jan 22, 2014

Sounds fair enough. I suppose we're really on the same page after all. :)

mikeash · on Jan 22, 2014

Sounds good! Just remember, if the Internet Police show up, this never happened.

samspot · on Jan 21, 2014

Sure, but in most programming classes they want the source so they can compare to others to see who cheated. It's unusual to be graded on the binary.

gcb0 · on Jan 21, 2014

luckily for you your professor did not have an AMD cpu...

AmblingAvocado · on Jan 21, 2014

Indeed - we were developing on & evaluated on a homogeneous cluster :)

greatzebu · on Jan 20, 2014

I was trying to build WRF from source a few years ago for a project in grad school, and the bulk of the program was one giant file that crashed gfortran when you tried to compile it. So compiling with a non-Intel compiler can present some problems.

dchichkov · on Jan 20, 2014

Er, I was building it from source... Don't remember exactly was blocking me from using GCC, vaguely remember something about OpenMP support... The config in question was MPI+OpenMP, where OpenMP was used to parallelize withing a node of a cluster and MPI for the cluster itself.

wmf · on Jan 20, 2014

(2009)

Since then Intel settled the lawsuit by paying $10M and agreeing to add the following disclaimer to their compilers: "Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors..." http://software.intel.com/en-us/articles/optimization-notice... http://www.anandtech.com/show/3839/intel-settles-with-the-ft...

lotyrin · on Jan 20, 2014

Yep, and that's been interpreted such that - just in case - we add a link on every page of software.intel.com to http://software.intel.com/en-us/articles/optimization-notice

Whether or not the current page has anything to do with compilers.

Also, judging by the URL they made it an 'article' instead of a 'page' again... I'll have to see if I can get someone to fix that.

coderintherye · on Jan 20, 2014

Happy to see you guys are using Drupal. I usually remove article as one of the first steps in an install and replace with something that confuses people less and/or restrict permissions appropriately, but it's hard to always get people to always follow the right path.

shocks · on Jan 20, 2014

I presume the notices are images so they can't be indexed by search engines?...

lotyrin · on Jan 21, 2014

Not sure, more likely somebody got a zip file full of images from the lawyer(s) and decided to put those up exactly as provided.

tharko · on Jan 20, 2014

The notice is only visible in Italian, Chinese, and Korean

fnordfnordfnord · on Jan 20, 2014

Intel paid $10M into a pool to reimburse customers who purchased the Intel compiler. Intel later paid over $1B to AMD settling a number of cases with AMD and the US DOJ. Intel was also fined by the EU ca 2009.

gwern · on Jan 20, 2014

> Since then Intel settled the lawsuit by paying $10M and agreeing to add the following disclaimer to their compilers

Is it just me or does that not actually fix the problem?

fnimick · on Jan 20, 2014

It fixes the legal problem. Intel isn't required to provide an optimized compiler for competitors' chips, but it is required to note that its compiler that is compatible with those chips doesn't optimize code for them.

I don't agree either, but it's a perfectly valid solution (and probably the best for Intel's bottom line).

scotty79 · on Jan 21, 2014

I think that the fact that it fixes legal problem is itself a legal problem or rather legislation prolem.

nathan_long · on Jan 21, 2014

A more effective place to put the notice would be to standard output when you're compiling for AMD.

Peaker · on Jan 21, 2014

You're compiling for x86, not AMD. The crippled AMD version is based on a runtime check.

berkut · on Jan 20, 2014

There are several benchmarks out there showing that the Intel compiler can give a boost for AMD chips over MS VC++ and MingG++ on Windows.

They should have originally used supported flags for different features instead of blacklisting the name, but it's still difficult to know what chips support what and whether it's worth using those features, due to different instruction latency, etc.

ICC get register dynamic code switching to different paths in the exe, so that if the chip supports AVX2, it'll use that code path.

gcb0 · on Jan 21, 2014

intel paid more than 10M to lawyers so they could debate on legalities of the law instead of the intent of the law, hence not having to fix the problem.

they are not worried about fines. fines are cheap for those companies. they are worried about market control.

they are what they are thanks to phoenix reverse engineering IBM bios and the rise of the generic PC market. Now they fear anyone that can enter the market as easily as they entered and hope to not make the same mistake IBM did.

sounds · on Jan 20, 2014

Article is from 2009 but Agner's CPU optimizations manual is still very useful.

http://www.agner.org/optimize/optimizing_cpp.pdf

Instructions on how to patch Intel's CPU detection routine to do your bidding is in section 13.7, pp. 132-133.

The 2009 article also has this interesting tidbit: "It is possible to change the CPUID of AMD processors by using the AMD virtualization instructions. I hope that somebody will volunteer to make a program for this purpose. This will make it easy for anybody to check if their benchmark is fair and to improve the performance of software compiled with the Intel compiler on AMD processors."

gcp · on Jan 20, 2014

This is an old article. As far as I know, the settlement that was been reached was entirely laughable and most certainly doesn't remove the "cripple AMD" function. Now Intel just has to notify customers that it may not get optimal performance on other CPUs, and reimburse them the cost of the compiler if they can demonstrate that they mistakenly bought the compiler thinking that wouldn't happen, or something like that.

There is no new info in the linked article regarding the "new" FTC investigation.

salient · on Jan 20, 2014

Intel is one of the least ethical tech companies around. Have they even paid their 1 billion euro fine to the EU Commission yet for trying to force OEMs to not use AMD chips in their products?

http://www.engadget.com/2009/05/13/intel-fined-1-45-billion-...

boyter · on Jan 20, 2014

I wouldn't be surprised if they had. Off the top of my head they made over $7 billion doing this and can consider the fine as a cost of doing business.

sitkack · on Jan 20, 2014

When companies do this, they should be fully audited and fined 300% profit, split evenly between the harmed company and the government. If that puts them out of business, so be it.

sounds · on Jan 21, 2014

That would certainly discourage _getting caught_ violating the law.

It would also tend to kill off the older companies (weak law of large numbers: if a company violates any of the laws that will kill it, and it exists long enough, it eventually gets caught and killed).

It might even lead to some efforts at counter-legislation. For example, companies might lobby to _broaden_ the "get killed" legislation, which would result in lots of sympathy cases where companies were killed for "minor" offenses. Eventually the whole "kill the company" idea would fall out of favor.

http://en.wikipedia.org/wiki/Three-strikes_law

(Companies will tend to view a government audit as a death sentence, since it would damage them so much even without a 300% fine.)

Peaker · on Jan 21, 2014

I'd support fines that are proportional to general revenue or profit. A fine must hurt.

Also, an audit and 300% fines would probably not kill companies.

welterde · on Jan 21, 2014

There is plenty of space to the bottom though.. Intel at least publishes datasheets, technical documentation, etc. without requiring signing a NDA for most of their chips..

So in my books they still do quite a bit better than Broadcom, Realtek, etc.

userbinator · on Jan 21, 2014

I agree, although Intel is still not as open as they were back in the e.g. 8086 days - stuff related to the BIOS/memory controller init sequence is still AFAIK requiring NDA.

Better than AMD, at least - just try finding the pinout of socket AM2, which was released over 7 years ago.

pconf · on Jan 22, 2014

This is one of the reasons I only purchase AMD-based systems. Well that and the fact that AMD's CPU/GMU combo has better graphics performance.

It's either that or support a company whose market advantage is based on anti-competitive practices and who will spend a significant portion of their profits on reducing consumer's choice of CPU (up to the point they no longer have to of course).

TwoBit · on Jan 20, 2014

How does the Intel compiler compare to others today? We tried using it for game development years ago and it had too many problems to make it worthwhile (e.g. pathological behavior with some C++ code).

nkurz · on Jan 20, 2014

I'm disgusted by their intentionally poor performance on AMD, but Intel's compiler is excellent. I often optimize numeric and SIMD functions for x64 on Linux, and regularly compare the generated assembly code of current versions of GCC, CLang and Intel compilers. I have no experience with MSVC. I'm often using the C++ compiler, but the code is usually straight C and inline assembly.

In my anecdotal opinion, Intel produces faster and better code than GCC and CLang about 2/3 of the time, with GCC usually second, and CLang slowest. I love the idea of CLang, but so far find it's main advantage to be clearer error messages rather than fast code.

Bugwise, I think they all are about equal. Intel's weakness right now (for my work) is that it crumbles under very high vector register pressure. And as a free academic licensee, support seems limited to posting on a forum and hoping a relevant Intel employee wanders by.

If I had just one shot for a compile and was hoping for the best outcome without being able to test and verify, I'd compile with Intel. But considering that GCC comes with source, has a much larger community, and accessible bug trackers, it's probably a better day-to-day compiler. But if you are trying to maximize performance, you should definitely try out Intel.

moconnor · on Jan 20, 2014

The Intel compiler is extremely good at finding and exploiting vectorization (SSE/AVX) opportunities; using these instructions in hot loops is becoming key to getting anywhere near peak performance out of modern CPUs.

Most people don't care enough about performance to notice, but recompiling with Intel's compiler often shows a 5-15% difference on number crunching codes and that's before spending time investigating the vectorization output and fine-tuning.

On the other hand, if you really care about speed then someone with some experience in performance tuning will typically be able to make your code run 4-8x faster, vastly outweighing any benefits from the compiler.

sounds · on Jan 20, 2014

Just in case you skimmed moconnor's comment, it bears repeating:

Intel's compiler: 15% speedup

Hand-optimized code: 800% speedup

This gap in compiler tech is still a big deal today. Think about the early mainframes and how the code was all written in machine code or assembler. http://www.pbm.com/~lindahl/mel.html

Compilers can still improve, a lot.

• Parallel code? _still_ hand-written, even though choosing the right language/library can help. Note that choosing that language that makes parallelism easy may cost you when you actually go for the max parallel speedup

• GPU? hand-written. See: litecoin miners and bitcoin miners before that. OpenCL but were hand-tuned for a specific architecture

• Cross-platform? Java and C should be portable, but ask any Android developer how it really works

• And the one we're talking about here: number-crunching code? hand optimized!

I'm actually quite optimistic about the future of compilers. One of the reasons HN is so fun to read is that it comes up often.

raverbashing · on Jan 21, 2014

"Hand-optimized code: 800% speedup"

It really depends.

Especially in how naively the "non-optimized" code was written.

I can see vectorization accelerate from 2x to 4x (per core), but not much more than that (which the Intel compiler does best)

But even GCC can vectorize better today than in the early days of 4.0

sounds · on Jan 21, 2014

Sure, it depends. I've seen embarrassingly parallel (yeah, that's a real term) code with speedups in the 20's.

My personal best was a 9x speedup, partly by using SSSE3 and partly by some really good prefetching and non-temporal writes.

If you look at what I said in the very narrowest light, I agree that SSE2 all by itself typically delivers a 2x speedup per core over non-SSE code.

mikeash · on Jan 21, 2014

Technically, 8x faster is a 700% speedup.

Either way, using percentages there seems really misleading. 15% versus 700% (or 800%) looks like a much bigger difference than 1.15 versus 8 if you're not careful when thinking about it.

agumonkey · on Jan 20, 2014

Bothered me when I realized that maybe mainstream reviews (the ones able influence the average mass market buyer) were using binaries very biased in favor of intel.

GhotiFish · on Jan 20, 2014

What bothers me is that if a mainstream reviewer benchmarks ICC compiled programs, well, that isn't unfair. Real world programs are compiled with that compiler. AMD processors actually WILL under-perform on certain programs because of this.

That leaves a bad taste in my mouth.

wmf · on Jan 20, 2014

Maybe any program compiled with ICC should have the same disclaimer as ICC itself so people know that the program is biased.

GhotiFish · on Jan 21, 2014

well, personally I don't think that's enough. I'm amazed they got away with the plea bargain they did.

agumonkey · on Jan 20, 2014

How much ICC compiled binaries represent ? I guess on MS Windows it's the vast majority so you're right[1] but on other platform it's probably GCC/Clang, and there neither Intel or AMD are favorited.

[1] or maybe not ? are MS Cxx compilers inheriting from Intel ones ?

Freaky · on Jan 20, 2014

No, Microsoft make their own compilers.

jrockway · on Jan 21, 2014

If I were AMD, I'd just start calling my processor GenuineIntel. (Or maybe make it user programmable, and then absolve myself of any knowledge of what users are setting it to.) When the judge asks why, I'd say because those are the magic words to make certain binaries run faster, and I wanted to run a viable processor business.

This is not an acceptable use of trademarks.

fzltrp · on Jan 21, 2014

> This is not an acceptable use of trademarks.

But if you were Intel, would you have your engineers work on competitors' products to make sure they are well supported on your line of tools? Before making an answer, consider that the core implementation of AMD cpus differ significanrly from those of Intel: instruction timings are slightly different, whether you look at them individually or in groups. It's not just a matter of turning a switch to get optimal performances, and that's just the tip of the iceberg.

Now, from a business standpoint, I think it could make sense for them to make their compiler produce fast code for any chip, but the legal implications of having a conccurent's product burn because of code produced with your compiler might make you think twice before going that road. Intel probably chose the safe road for a reason. Also, note that the produced code isn't crippled (as in, it doesn't make AMD cpu execute endless loops, or produce wrong results more than Intel's ones), it just follow the safest path.

hvidgaard · on Jan 21, 2014

Add another flag to the compiler to produce code optimized code for any CPU with the warning that it's only been verified to work with Intel CPUs. With todays CPU I do not buy the "safest path" argument - perhaps I could accept that "we only default enable it for implementations we have verified in-house", which makes a lot of sense.

This sounds a lot more like Intel know they make the best compiler, and knowingly put non Intel CPUs at a disadvantage because it would seem that they have a faster CPU.

fzltrp · on Jan 21, 2014

I guess they could do that, and trust that customers will always be reasonable to not sue them when in those situtations that they wanted to avoid. Besides, it's not 2009 anymore: if they want to maintain their arch on the market against ARM founders, they should probably help AMD out as much as possible (though I remember reading somewhere once that AMD considered including ARM cores in their APUs - or maybe it was just a journalist's speculation).

> This sounds a lot more like Intel [...]

Just a thought here: should Intel do things to avoid sounding like bad competitors, or to give their customers the best product they can offer? We're engineers, we should also know not to fall for appearances, shouldn't we? I know, I supported my own reasoning with the legal aspect of things, which sometimes is not very reasonable in what it must handle. There goes my original point.

w1ntermute · on Jan 20, 2014

Can you spoof AMD CPUs to return "GenuineIntel" instead of "AuthenticAMD"?

haberman · on Jan 20, 2014

Search the linked page for "CPUID manipulation program for AMD" -- there is some info about this.

Apparently when run under virtualization, the CPUID instruction is intercepted and can thus be manipulated. There's also a github project for patching binaries generated from ICC to run the optimal code-paths even for AMD. But there doesn't seem to be a way to manipulate the hardware itself to change its vendor string.

staticfish · on Jan 20, 2014

Couldn't that also potentially break some logic branch through the chip where the app is expected to be running on a GenuineIntel processor? I'm not well versed in this.

sounds · on Jan 20, 2014

Not likely. AMD processors are very carefully designed to correctly execute code, even if it just assumes GenuineIntel and never even checks.

If code is dumb enough to try to use something low-level (let's use Bull Mountain RDRAND as an example) without checking for that specific feature bit, then it obviously is the code that is broken, leading to an illegal operation and it gets killed. That's not the CPU's fault.

Intel and AMD CPU manuals both pound in the point, too. In the sections on these advanced features they always insist that you check the feature bit first.

mikeash · on Jan 20, 2014

Your mention of RDRAND is a great point and made me think about just how many differences there are between different models of CPUs from the same vendor. I assume the differences between different Intel CPUs vastly outweigh the differences between similar Intel and AMD CPUs.

berkut · on Jan 20, 2014

Possibly, as Intel's compiler uses dynamic code-path selection based on the processor.

But more likely, it would mean slower code execution, as the code path wouldn't be optimised for AMD:

i.e. if the processor check thinks it's running an i7 Sandy Bridge, with an SSE float divide latency of 11, but the AMD chip has a latency of 23, so the unrolled loop that worked well for the i7 doesn't work at all for the AMD chip.

makomk · on Jan 21, 2014

Except it turns out not to work that way - most of the optimisations turn out to be general ones, and some of the non-optimised code paths AMD gets for stuff like string copying are slower than a naive implementation. (Actually, Intel wound up having to fudge the CPUID result on their newer processors for this reason. Otherwise binaries compiled on their older compilers would detect an unrecognised chip and run the slow path - including some of the benchmarkers reviewers would use to compare the two!)

aeonsky · on Jan 20, 2014

I'm still not entirely sure why is Intel forced to do this? Is it only because they advertise that it optimizes equally well for any CPU? If not, then I don't really see why they can force them to provide another AMD-friendly version.

freehunter · on Jan 20, 2014

Intel is the market leader by a good margin, and in the past has been known to use unfair tactics to keep other players out of the market. AMD has been in a lot of lawsuits with Intel due to this.

In this case, it's not just that Intel isn't playing nice with AMD, it's that they're specifically using poor optimizations during compile if you're not using an Intel processor. That's not by accident, that's done on purpose to make non-Intel processors seem worse. What you're allowed to do while competing in the market changes when you're the dominant player in the market.

aeonsky · on Jan 20, 2014

I am usually not a free market extremist, but if Intel makes an excellent software product after years and millions of dollars in R&D, and make it only work for certain platforms, power to them.

Spooky23 · on Jan 21, 2014

You cannot give these companies an inch with stuff like this. In 2014, we're spoiled and for the most part don't realize that our network providers aren't the only potential toll keepers nickel and dimeing us.

I don't know if you are familiar with mainframe or similar computing technologies. When you buy an IBM mainframe (or Power unix box to a lesser extent), you're essentially metered by a CPU budget. You're not permitted to use the full capacity of your system unless you pay.

This case is a little different, but the point stands. Companies should not be allowed to sabotage the competition or hobble the ability of a device capable of doing a task from doing it. The free market works when the stakeholders don't cheat.

wmf · on Jan 20, 2014

The problem is that they penalized AMD without telling anyone. The market only works if you know what you're buying.

mikeash · on Jan 20, 2014

To make an obligatory car analogy, imagine if Ford opened up gas stations that sold really good gas, but this gas was somehow made to run much less efficiently in non-Ford cars. And further that they didn't tell anyone this, and just left you to assume that if you filled up your Prius with Ford gas and subsequently got 20MPG, the car was to blame.

fnimick · on Jan 20, 2014

It's even worse than that - since companies are distributing binaries compiled with icc, it's more like a Ford gas refinery distributing gas to normal gas stations that secretly runs terribly in other cars. There simply is no way for the consumer to know what they're getting.

sitkack · on Jan 20, 2014

This nock on effect is where the real harm is done. The fact that Intel is not checking for feature flags, but rather the existence of an Intel processor is actionable. They aren't following their own best practices for accessing optional features of the chip.

corresation · on Jan 20, 2014

That analogy worked in 2005, but eight years on the fact that the Intel compiler optimizes primarily for Intel processors is common knowledge, and is explicitly stated by Intel repeatedly on all product materials. Invariably someone will bring up the "but what about benchmarks corrupted by this compiler" conspiracies, without a single example of a benchmark so contrived.

It's worth noting that performance and processors isn't as simple as "it has the feature, so use it". Each instructions have timings that vary on different models, and something like SSE(2|3|4) or AVX(2|512) can vary dramatically in its net benefit or detriment by the number of words, alignment, and so on. Many people with the ICC and code that they think will be super fast using it often are surprised to find it hasn't chosen to use them at all, simple setup and teardown eating all possible benefit.

In an ideal world we would have an open compilers that made best in class code for all major processors. Sadly that doesn't exist, and even now we have a case where a lot of the complaints about the ICC are "it crippled the code for my AMD....but still made better code than every other compiler".

mikeash · on Jan 20, 2014

People have pointed out that the "Intel" code is faster than the "AMD" code even on AMD chips, so the stuff about performance being difficult to achieve across different CPUs, while true, does not seem to be relevant.

All in all, you seem to be downplaying this far more than it deserves. It is not a case that the "compiler optimizes primarily for Intel processors". If it simply produced code built to be good on Intel and with no attention paid to AMD CPUs at all, that would be fine. Nobody would complain. Nobody would expect anything else, really. But that's not what they do: instead, they generate code that checks for an AMD CPU and then deliberately chooses a suboptimal path in that case. So it's not a case of "optimizes primarily for Intel processors", but rather "intentionally cripples performance on non-Intel processors". That is to say, there is a vast gulf between indifference and purposefully making things worse.

fzltrp · on Jan 21, 2014

> People have pointed out that the "Intel" code is faster than the "AMD" code even on AMD chips, so the stuff about performance being difficult to achieve across different CPUs, while true, does not seem to be relevant.

It is relevant, if you consider that optimization isn't just counting cycles at individual instructions. Some optimization pass may be CPU agnostic (and the difference of performances between compilers on AMD probably show that aspect), and some very well depend on CPU peculiarities.

Also, if Intel must provide good support for all AMD chips, then they will have to do the same for any other competitor (and there are some iirc).

> they generate code that checks for an AMD CPU

Is it really what they do? Are they checking for "GenuineIntel" CPU, or for "AuthenticAMD" ones? There's a slight difference, even if AMD is their only real competitor atm.

mikeash · on Jan 21, 2014

"Also, if Intel must provide good support for all AMD chips, then they will have to do the same for any other competitor (and there are some iirc)."

Again, nobody in this discussion is saying that Intel must provide good support for all AMD chips. All anyone is saying is that Intel should stop explicitly checking for non-Intel chips and running deliberately slow code on them.

Again, it would be just fine if Intel optimized their compiler exclusively for Intel CPUs and let non-Intel CPUs deal with whatever code they generated. That's what everyone would expect Intel to do. Nobody sane expects Intel to optimize for AMD CPUs in their compiler. We just expect Intel not to put extra effort into crippling them.

fzltrp · on Jan 21, 2014

I understand you. But, is it possible that say, between two features, say SSE 4 and AVX, the less powerfull one happens to be the most efficient for a given algorithm on platform A, and the least efficient on platform B? If yes, how would a the compiler know which path to choose without knowing which platform it is targeting?

mikeash · on Jan 21, 2014

It wouldn't, but the sane way to handle that would be to special-case platform A, and let platform B fall back to feature detection, rather than falling back to the worst possible code.

fzltrp · on Jan 21, 2014

All right, then I suppose that what you are saying is that Intel should leave the genuine Intel chips detection out, and let customers implement specialized paths for AMD products, if the automatically selected path happens to underperform (for a certain performance expectation level), which might be less likely to occur if all the generated paths are available. Did I get this right?

And for people looking to extract every last bit of power from their chips (AMD or not), they might have to implement the path by hand anyway.

> rather than falling back to the worst possible code.

Note that (I think it might have been said elsewhere), it's not the worst possible code, but the least efficient one generated by the compiler (which happens to be quite good already). /pedantic mode

Edit: clarifications.

mikeash · on Jan 21, 2014

There are two reasonable choices for Intel to follow:

1. They decide to implement the best possible x86 compiler for all CPU vendors. In this case, they optimize for AMD chips (and anyone else selling x86 chips) just like they currently do for Intel, possibly including AMD-specific code paths.

2. They decide to implement the best possible compiler for Intel x86 CPUs. In this case, they should just ignore the existence of other vendors and do the best they can for their own stuff. If it runs fast on AMD, great. If it doesn't, not their problem.

I'm not sure which of those two what you said falls under, but I think it's one of those. Unfortunately, they have chosen a third path, where not only do they specialize for Intel, but they check for non-Intel and deliberately pessimize performance there.

fzltrp · on Jan 21, 2014

2nd choice (sorry for my confused english). But then there might be legal consequences to factor in. Engineers shouldn't probably care about that.

As for the first choice, ideally Intel should do that to provide the best possible compiler out there, but that would really be shooting oneself in the foot, unless they are guaranteed to always have the upper hand on the hardware side. It would also require them to study AMD cpus deeply (how instructions get translated to microcode, how that microcode is optimized, etc) - they probably have people doing that (if that's legal).

Thanks for taking the time to answer me.

Edit: modified my upper comment.

corresation · on Jan 20, 2014

Your first statement does not follow at all.

Do people realize that we're talking very specifically about auto vectorization? This is a very unique, niche area of software (and it is absolutely an optimization, various hysterics notwithstanding), and this notion that various random software you're running are being "crippled" is utterly nonsensical.

mikeash · on Jan 20, 2014

I don't even understand what you're saying here.

We seem to be talking past each other, so let me briefly summarize what's going on, since you keep talking about optimizations and how difficult they are, and that just doesn't matter.

Intel's compiler produces multiple code paths, each one optimized for different CPU features.

The generated code always runs the slowest "fallback" code path on AMD CPUs.

Other code paths are still faster than the "fallback" path, even on AMD CPUs, despite not being optimized specifically for them.

Thus, Intel is artificially reducing the performance of their generated code on AMD CPUs. They put in more work to make this happen. If they had simply left out the CPU vendor check, AMD CPUs would perform better.

corresation · on Jan 20, 2014

I stated that optimization is hard (each architecture has timings that impact when something is beneficial/when it isn't): Intel baked this into the ICC, but only for Intel targets (for obvious commercial reasons). You say that no, in some anecdote of apparently some anecdotal piece of code on some anecdotal set of data on some anecdotal AMD processor, this was disproven. But that doesn't disprove it in the general whatsoever. If Intel simply generated and ran Intel-targeted code always, we would have the same debate because why didn't they align their instructions and do it in such an order because that would be better on AMD.

It is entirely probable that simply running the "Intel" code chain will yield code slower than without on some sets of code with some sets of data, on some AMD architectures. Again, in most cases with vectorization you are not suddenly seeing a magnitude improvement, but rather something that can be marginal in some cases with some cases of data. It is a very hard problem, which is exactly why for all of the bluster the Intel Compiler is still considered the best compiler, 8 years into this controversy. AMD has been contributing to the Open64 compiler for years, but you don't hear much about it. And I'll bet their contributions don't put too much care into Intel processors.

mikeash · on Jan 21, 2014

"If Intel simply generated and ran Intel-targeted code always, we would have the same debate because why didn't they align their instructions and do it in such an order because that would be better on AMD."

That same debate might happen. But I doubt we'd see it, because most people would dismiss such a debate as stupid. Why would Intel put any effort into making their compiler optimize code for AMD CPUs?

Again: there is a world of difference between simply not optimizing for AMD, and deliberately running slow code when an AMD CPU is detected.

If Intel just checked CPU features and decided based on that, would it still produce bad code for AMD sometimes? Probably. Would it be as slow? Doesn't sound like it, from people with real-world experience with the compilers. Would people still complain? Yes. Would those complaints have any merit? No.

nkurz · on Jan 21, 2014

Just to join the chorus: it's not a hard problem. Intel intentionally adds code to the executables produced by their compiler so as to limit their performance on non-Intel CPUs. They lost a federal lawsuit filed by the FCC on deceptive practices because they did not inform their customers, who thought they were buying a working compiler, that they were doing this.

Instead of simply removing this code, which serves no purpose other than degrade performance on non-Intel CPUs, they reached an agreement to put legal disclaimers on every web page mentioning their compiler that they reserve the right to do this. But because of the slick legal language of the disclosure, most people get the impression that they are simply failing to make AMD specific optimizations, rather than intentionally preventing non-Intel CPUs from utilizing the optimizations already present in the code.

I think we all agree that at this point Intel's practice is fully legal. Their engineers should feel proud of having built a really solid compiler. But Intel-the-company deserves to be shamed for its slimy underhanded practices, and the engineers should feel a certain amount of revulsion for allowing themselves to be used in this way. Perhaps if they stood up for the obvious right approach, they could help change the company for the better.

nitrogen · on Jan 20, 2014

1. Old versions of icc ran well on AMD.

2. New icc version adds check for "GenuineIntel".

3. New icc version now runs slowly on AMD.

corresation · on Jan 21, 2014

This, like almost every counter point I've faced thus far, is simply wrong. It is manufactured reality.

ICC 8 added auto-vectorization. It, the very first auto-vectorization version, added the "GenuineIntel" branch for such vectorized code, because despite all of the fiction stated otherwise, vectorizing is actually a very hard task (hence why Intel maintains such a lead, and people are still griping about this 9 years after it came about).

I am hardly standing up for Intel, but this is Reddit-level conversation, where people simply say what they hope is true.

mikeash · on Jan 21, 2014

I don't understand your "because" statement. Intel added a check for Intel CPUs because vectorization is difficult. That's a complete non sequitur as far as I can tell. It makes as much sense as saying that I baked a chocolate cake because it rained yesterday.

Yes, various optimizations, including auto-vectorization, are difficult. Why does that mean Intel had to add a check for Intel CPUs in their compiler?

corresation · on Jan 21, 2014

I'm a glutton for punishment, I suppose.

The Intel compiler makes tight, fast x86[^1]. It ALSO can optionally generate auto-vectorized code paths for specific Intel architectures (it is not simply "has feature versus doesn't have feature", but instead chooses the usage profile of features based on the runtime architecture. Each architecture has significant nuances, setup and teardown costs, etc, and anyone who says "they should just feature sniff" does not understand the factors, though that certainly doesn't stop them from having an opinion), for that small amount of niche code that can be vectorized. Saying that because they don't do the latter for AMD processors means they "crippled" them is nonsensical.

Just to be clear, I have heavily used the Intel compiler for back-office financial applications. I'm not just repeating some opinion I happened across. Nor do I have any particularly love for Intel.

Further, if you understand that Intel specifically targets specific Intel architectures with every branch path, saying "well just run it on all things", again, you simply don't understand the discussion, or the architecture based dispatcher. Yeah, "just run it" might run perfectly fine, and for a contrived example might yield better runtimes, but it also can yield runtime errors or actual performance losses.

As I have repeatedly stated, we should expect great cross architecture and platform (including ARM, which with NEON also has vectorization) compilation with auto-vectorization from the dominant compilers, including GCC, LLVM, and VC. But somehow it always returns to the Intel compiler, nine years after they publicly stated "Yeah, this is for Intel targets".

^1 - So much so that in almost all of these conversations, the people who complain about Intel compilers still use them because it still generates the fastest code for AMD processors, vectorization or not. Which is pretty bizarre, really.

mikeash · on Jan 21, 2014

Well, your explanation seems completely at odds with what is currently the top-voted comment in this discussion. The linked discussion of the patch he built to fix the problem indicates that the dispatcher does just do CPU feature detection. Here is the URL for reference:

http://www.swallowtail.org/naughty-intel.shtml

According to that, the code simply does a feature check for SSE, SSE2, and SSE3. Except it also does a check for "GenuineIntel" and treats its absence as "no SSE of any kind" even if the CPU otherwise indicates that it does SSE. That check is completely unnecessary and does nothing but slow (or crash!) the code on non-Intel CPUs.

If you still think that's wrong, could you post the relevant code to show it?

corresation · on Jan 21, 2014

That link doesn't actually show what it does to determine whether to use SSE or SSE2 (much less SSE3 and beyond). That it derives a boolean value is not the same as feature detection.

Further the bulk of that entry was from 2004, which is pertinent given that at the time the new Pentium 4 was the first Intel processor with SSE2, and the SSE implementation on the Pentium III was somewhat of a disaster -- both single-precision width (it simulated 128-bits through two 64-bit operations, and for the P3 compilers could optimize for its specific handicap), and sharing resources with the floating point unit. So the feature flag, coupled with "GenuineIntel", was all they needed to know for the two possible Intel variants with support.

Since then the dispatcher and options have grown dramatically more complex as the number of architectures and permutations have exploded.

mikeash · on Jan 21, 2014

Well, here's a complete analysis of the function:

http://publicclu2.blogspot.com/2013/05/analysis-of-intel-com...

Unfortunately, it doesn't show the raw assembly. But in the absence of any information to the contrary, I'm perfectly happy to trust this pseudocode. It shows a bunch of feature checks, preceded by a single "GenuineIntel" check. The code that's gated on "GenuineIntel" would work just fine on non-Intel CPUs. It might sometimes produce sub-optimal results, but overall it'll be fine. There are some CPU family checks, but my understanding is that non-Intel CPUs return the same values that Intel CPUs do for similar architectures/capabilities.

We have multiple people saying that the code runs faster if the "GenuineIntel" checks are removed, we have pseudocode for the function in question that shows a bunch of feature detection with a bit of CPU family detection, neither of which are at all Intel specific. And then we have you, who can't seem to substantiate your claims at all.

If you have actual code or other reasonable evidence to support what you're saying, I'd love to see it. But right now, I'm not buying it.

corresation · on Jan 20, 2014

Intel is not the market leader in compilers, however: It is somewhat rare to come across teams using the Intel compiler, and they tend to be in niche modeling/simulation or financial realms, more often than not with the sort of code that you compile for a specific piece of deployment hardware, compiling again when you move it to something different.

Intel makes the ICC seemingly to try to encourage the adoption of newer features in newer processors -- multi-threading for many-core processors, SSE, SSE2, SSE3, AVX, AVX2, soon AVX-512, etc. It is worth noting that simply using a feature (e.g. SSE, AVX) because it is there does not guarantee performance improvements in all scenarios -- the ICC uses specific model timings to make some of its choices.

The obvious solution for this problem is to never have a reason to use the ICC compiler. For the auto-vectorization and use of things like AVX be as advanced in gcc, llvm, and even the Microsoft compiler.

wglb · on Jan 21, 2014

Have you ever seen benchmarks to suggest that Intel is not the market leader in compilers is actually true?

corresation · on Jan 21, 2014

That comment was in relation to market share, not performance. Few would argue that Intel has more than a low single-digit percentage market share of the overall compiler market. People should be far, far more concerned about how GCC, llvm and Visual Studio do in vectorizing code to SSE/AVX.

wglb · on Jan 21, 2014

So if we were to define "market" as "compilers that people pay money for", would Intel be market leader then? (Certainly not a common definition, I agree.)

pjmlp · on Jan 21, 2014

The gaming industry does use it quite often. It is the Watcom compiler of the modern times.

nkurz · on Jan 20, 2014

Yes, the this part of the settlement was based on buyers of Intel's compiler being misled into thinking that it produced good code for all processors. Intel is not forced to change their compiler to produce reasonable code on AMD, only to disclose this practice.

http://www.anandtech.com/show/3839/intel-settles-with-the-ft...

Intel's behavior continues to be scummy and reprehensible, but by adding the disclaimer to all pages regarding the compiler they are now legally compliant. Still, it's a good enough compiler that at times it may be worth using and patching the binary to avoid their dirty tricks.

rcxdude · on Jan 20, 2014

It's because they specifically check for an intel chip, but the 'intel-optimised' version is also faster on AMD chips than the default (and they are well aware of this). Regardless of their advertising it's a scummy and anticompetitive move.

happycube · on Jan 20, 2014

At this point, does intel need that function to make AMD's CPU cores look bad?

ibrahima · on Jan 21, 2014

The thing that's sad is that if Intel hadn't taken so many unethical approaches to keeping AMD down when it actually was competitive with Intel on performance, the market could look totally different today. Back in the Athlon XP and early Athlon 64 days AMD was routinely ahead in benchmarks as well as power consumption, while P4 blustered along with higher and higher clock speeds. If they had gotten the market share their performance "deserved" (in some abstract sense, of course), they could have put more money into R&D and developed better follow on products. Instead, their market share grew only slowly, and eventually when Intel got back to improving performance AMD just fell behind and never recovered.

Intel's engineering prowess has to be respected, but how they got there by abusing their market position is deplorable. At this point I don't think there is any rational reason to buy AMD at any price point, they've been dominated so hard. (Buying ATI may not have helped, but these days that's the only part of AMD that's even still competitive.)

wmf · on Jan 20, 2014

You're talking about a company whose motto is "only the paranoid survive". Why win when you can utterly dominate?

salient · on Jan 20, 2014

I don't think that's been their motto since Otellini took over. Look where they are now in the mobile market. Otellini put the profitability of their Core chips above improving Atom in the first few years, even when it came to netbook performance, which was already terrible. Combined with the fact that they forced OEMs to not buy AMD alternatives during the same time, Otellini just didn't think it's necessary to improve the performance of Atom too much.

They only started caring about power consumption when it was already obvious to everyone that ARM is going to pose a threat to them eventually. I think if everyone sees something that's by definition not "paranoia". To be paranoid, you have to see and believe something before others see it.

JohnBooty · on Jan 20, 2014

  They only started caring about power consumption when 
  it was already obvious to everyone that ARM is going to
  pose a threat to them eventually.

I'm being a bit pedantic, but it seems to me they refocused on power consumption beginning with the launch of the Pentium M (forerunner of the Core and Core 2 lines) which was released in 2003 and was surely in development several years before that.

Or do you think they were thinking ahead to ARM already in ~2001 or so? Maybe they were... although I think they were thinking about targeting laptop sales in general at that point, not ARM specifically.

agumonkey · on Jan 20, 2014

That's true, the NetBurst syncope made them redesign toward efficiency, but still, the rise of ubiquitous mobility forced another inflection in their TDP curve. And they're still sweating over it since the PC market is shrinking and they need to get their foot in the smartphone/tablet market (see the bay-trail subsidize effort http://liliputing.com/2014/01/bay-trail-tablets-cheap-intels...)

yuhong · on Jan 21, 2014

I think he came from marketing. Remember the Intel 487SX?

Fasebook · on Jan 21, 2014

AMD has 10-16 core server CPUs that are very price competitive with Intel's Xeon performance. They generate more heat however, so they are not economic in datacenters for energy usage reasons, but they are ideal for getting the most powerful workstation possible.

bd_at_rivenhill · on Jan 20, 2014

This all seems to indicate that the intel compiler emits multiple, cpu-dependant code paths for a given binary, which seems insane to me due to the amount of extra memory that this would require. Am I missing something here?

stephencanon · on Jan 20, 2014

Extra code on disk doesn’t cost anything (well disk space, but that is “free” for practical purposes). A compiler can arrange so that all of the code for a given architecture appeared consecutively in the binary, and then the pages and cachelines containing implementations unused by the processor on which you are running are never loaded into memory / never take up space in the cache.

Also, in practice code is a tiny portion of the size of a typical application. Far more space is consumed by resources like images and sounds.

gonzo · on Jan 21, 2014

You mean "data".

wmf · on Jan 20, 2014

Usually code isn't that big and maybe these optimizations are only used sparingly so that most of the code is shared.

tbrownaw · on Jan 20, 2014

Probably no more insane than loop unrolling, which I understand is fairly standard and well-accepted.

userbinator · on Jan 21, 2014

Loop unrolling is actually becoming somewhat of an outdated technique for modern processors:

http://www.agner.org/optimize/blog/read.php?i=142 "It is so important to economize the use of the micro-op cache that I would give the advice never to unroll loops."

I don't know how AMD compares.

raverbashing · on Jan 20, 2014

And the question is: how about we stop paying Intel for an unfair product?

I know, their compiler produces the fastest code, but maybe you can get good (enough) results by using libraries and maybe some manual optimization

fest · on Jan 21, 2014

Isn't this crippling a compile-time thing?

Is there something in binary that executes best-performing instructions (as opposed to execute just the instructions compiled in) when it's being executed on a specific CPU? If so, how exactly does it work?

neolefty · on Jan 21, 2014

It's actually a runtime switch. A compiled x86 binary that uses extra-wide number-crunching instructions (SSE etc) must also work on older processors that don't have those instructions, so it will have two or more code paths. The code paths all perform equivalent computations, but using different instructions.

For example, if you are adding 4 pairs of 64-bit numbers, and there's a special add-4-pairs-of-64-bit-numbers instruction, but it's specified as part of SSE4 (I made that up, but it's the kind of thing that you would find), then you can ask the CPU if it supports SSE4. If it does, then you say great, use this code path that requires SSE4, and we'll do the whole operation in three instructions: load, add, store. Or something.

However, if the CPU says that it doesn't support SSE4, then you'd better have a backup plan. It doesn't have to run as fast, but it should compute the same answer. If it's compiled C code (as opposed to hand-written assembler), the compiler will have you covered. Instead of a single SSE4 instruction, maybe it will take 4 regular 64-bit x86 add instructions instead.

(And if you've written it in assembler, then you probably provided the compiler with a backup C implementation to use if SSE4 isn't supported.)

Intel's compiler is being unfair to AMD CPUs because -- even if they support the instructions that you want -- it won't use them. It will unnecessarily fall back to the plain old non-SSE x86 instructions.

fest · on Jan 22, 2014

Thanks for your answer, it did not occur to me initially, but it makes a lot of sense!

coldcode · on Jan 20, 2014

When I worked for a game company we used the Intel compiler for a couple of versions but it caused so many issues for people with AMD we switched back to the MS compiler. In the end the performance difference wasn't enough to matter.

fzltrp · on Jan 21, 2014

That's interesting: could you elaborate on those problems? I was pretty much supporing Intel on the ground that their competitors products were just running a safe path, but your input might change my view entirely.

coldcode · on Jan 22, 2014

It wouldn't be useful anymore, that was 3-4 years ago. At the time besides having AMD issues we had floating point optimization issues which messed up our physics. I doubt it's an issue today.