> instead of keeping their optimizations proprietary
There were no magic secret optimizations to release. It just straight up did not work. They had to add back dynamic branch prediction, and even then the load store latency was such trash that they had to put ginormous L3 caches on it to get even close to reasonable performance.
That's fair. I have never worked with them, so I don't know the precise details; but one of the big complaints I've heard was that compilers weren't ready for them yet, while these days there's now a larger field of open source compilers and has been a lot more research in parallelism.
Yeah, compilers today are no better. We found the limits of statically scheduled parallelism pretty fast. On code that uses static scheduling, a modern OoO processor can easily duplicate what IA64 was capable of (and a pipelined loop using AVX will utterly smoke it), while being far better at all the stuff IA64 failed at.
There were no magic secret optimizations to release. It just straight up did not work. They had to add back dynamic branch prediction, and even then the load store latency was such trash that they had to put ginormous L3 caches on it to get even close to reasonable performance.