That argument is often made for JITs, but I have never seen a real world example...

mikemike · on Dec 8, 2010

Alias analysis is a good example. A JIT compiler may speculatively add dynamic disambiguation guards (p1 != p2 ==> p1[i] cannot alias p2[i]). If the assumption turns out to be wrong, the JIT compiler dynamically attaches a side branch to the guard using the new assumption (p1 == p2 ==> p1[i] == p2[i], which is an even more powerful result).

Doing this in a static compiler is hard, because it would have to compile both paths for every such disambiguation possibility. This quickly leads to a code explosion. You'd need very, very smart static PGO to cover this case: there are no branch probabilities to measure, since the compiler doesn't know that inserting such a branch might be beneficial. It may only derive this by running PGO on code which has these branches, which leads to the code explosion again.

Auto-vectorization is another example: a static compiler may have to cover all possible alignments for N input vectors and M output vectors. This can get very expensive, so most static compilers simply don't do it and generate slower, generic code. A JIT compiler can specialize to the runtime alignment and even compile secondary branches in case the alignment changes later on (e.g. a filter fed with different kernel lengths at runtime).

wmf · on Dec 7, 2010

I agree in general, although I will point out that virtually no C developers use PGO while it's on by default in HotSpot and now V8. (Of course, it looks like Java needs PGO just to try to catch up with gcc -O3.)

nimrody · on Dec 8, 2010

Not strictly the same but take a look at the CPU world:

VLIW (compilers try to optimize processing based on static knowledge - e.g. Intel's Itanium) vs. the current Intel CPUs (based on P3/P4 architecture) which dynamically allocate resources depending on runtime knowledge.

Runtime information can help compilers. Just look at profile guided optimizations in current static compilers.

The real trouble in JIT compilers is usually that the target languages semantics are very high level. For example an integer in C is machine sized and is not expanded in size to fit its value -- unlike some dynamic languages.

nl · on Dec 7, 2010

http://weblogs.java.net/blog/2008/03/30/deep-dive-assembly-c...

This link doesn't quite give what you are after (it's mostly about static compilation in the Java HotSpot compiler), but I believe the lock elision features (http://www.ibm.com/developerworks/java/library/j-jtp10185/in...) have to be done at runtime in the JVM (because of late binding).

Obviously this doesn't totally invalidate your argument ("except in the cases where runtime code loading is used"), but it is worth noting that in many languages late binding is normal, and so this is the general case.

Also, HP's research Dynamo project "inadvertently" became practical. Programs "interpreted" by Dynamo are often faster than if they were run natively. Sometimes by 20% or more. http://arstechnica.com/reviews/1q00/dynamo/dynamo-1.html

mikemike · on Dec 8, 2010

This is a common misinterpretation of the Dynamo paper: they compiled their C code at the _lowest_ optimization level and then ran the (suboptimal) machine code through Dynamo. So there was actually something left to optimize.

Think about it this way: a 20% difference isn't unrealistic if you compare -O1 vs. -O3.

But it's completely unrealistic to expect a 20% improvement if you'd try this with the machine code generated by a modern C compiler at the highest optimization level.