The Intel compiler is extremely good at finding and exploiting vectorization (SS...

sounds · on Jan 20, 2014

Just in case you skimmed moconnor's comment, it bears repeating:

Intel's compiler: 15% speedup

Hand-optimized code: 800% speedup

This gap in compiler tech is still a big deal today. Think about the early mainframes and how the code was all written in machine code or assembler. http://www.pbm.com/~lindahl/mel.html

Compilers can still improve, a lot.

• Parallel code? _still_ hand-written, even though choosing the right language/library can help. Note that choosing that language that makes parallelism easy may cost you when you actually go for the max parallel speedup

• GPU? hand-written. See: litecoin miners and bitcoin miners before that. OpenCL but were hand-tuned for a specific architecture

• Cross-platform? Java and C should be portable, but ask any Android developer how it really works

• And the one we're talking about here: number-crunching code? hand optimized!

I'm actually quite optimistic about the future of compilers. One of the reasons HN is so fun to read is that it comes up often.

raverbashing · on Jan 21, 2014

"Hand-optimized code: 800% speedup"

It really depends.

Especially in how naively the "non-optimized" code was written.

I can see vectorization accelerate from 2x to 4x (per core), but not much more than that (which the Intel compiler does best)

But even GCC can vectorize better today than in the early days of 4.0

sounds · on Jan 21, 2014

Sure, it depends. I've seen embarrassingly parallel (yeah, that's a real term) code with speedups in the 20's.

My personal best was a 9x speedup, partly by using SSSE3 and partly by some really good prefetching and non-temporal writes.

If you look at what I said in the very narrowest light, I agree that SSE2 all by itself typically delivers a 2x speedup per core over non-SSE code.

mikeash · on Jan 21, 2014

Technically, 8x faster is a 700% speedup.

Either way, using percentages there seems really misleading. 15% versus 700% (or 800%) looks like a much bigger difference than 1.15 versus 8 if you're not careful when thinking about it.