The Intel compiler is extremely good at finding and exploiting vectorization (SSE/AVX) opportunities; using these instructions in hot loops is becoming key to getting anywhere near peak performance out of modern CPUs.
Most people don't care enough about performance to notice, but recompiling with Intel's compiler often shows a 5-15% difference on number crunching codes and that's before spending time investigating the vectorization output and fine-tuning.
On the other hand, if you really care about speed then someone with some experience in performance tuning will typically be able to make your code run 4-8x faster, vastly outweighing any benefits from the compiler.
Just in case you skimmed moconnor's comment, it bears repeating:
Intel's compiler: 15% speedup
Hand-optimized code: 800% speedup
This gap in compiler tech is still a big deal today. Think about the early mainframes and how the code was all written in machine code or assembler. http://www.pbm.com/~lindahl/mel.html
Compilers can still improve, a lot.
• Parallel code? _still_ hand-written, even though choosing the right language/library can help. Note that choosing that language that makes parallelism easy may cost you when you actually go for the max parallel speedup
• GPU? hand-written. See: litecoin miners and bitcoin miners before that. OpenCL but were hand-tuned for a specific architecture
• Cross-platform? Java and C should be portable, but ask any Android developer how it really works
• And the one we're talking about here: number-crunching code? hand optimized!
I'm actually quite optimistic about the future of compilers. One of the reasons HN is so fun to read is that it comes up often.
Either way, using percentages there seems really misleading. 15% versus 700% (or 800%) looks like a much bigger difference than 1.15 versus 8 if you're not careful when thinking about it.
Most people don't care enough about performance to notice, but recompiling with Intel's compiler often shows a 5-15% difference on number crunching codes and that's before spending time investigating the vectorization output and fine-tuning.
On the other hand, if you really care about speed then someone with some experience in performance tuning will typically be able to make your code run 4-8x faster, vastly outweighing any benefits from the compiler.