Not to detract form your point, but Itanium's design was to address the code compatibility between generations. You could have code optimized for a wider chip run on a narrower chip because of the stop bits.
The compiler still needs to know how to schedule to optimize for a specific microarchitecture but the code would still run albeit not as efficiently.
As an aside, I never looked into the perf numbers but having adjustable register windows while cool probably made for terrible context switching and/or spilling performance.
As an aside, I never looked into the perf numbers but having adjustable register windows while cool probably made for terrible context switching and/or spilling performance.