> I think the SIMD part has more to do with loop analysis than ILP.
If you know how to rewrite the algorithm in such a way so that it makes close-to-ideal utilization of CPU ports through your SIMD then it is practically impossible to beat it. And I haven't seen a compiler (GCC, clang) doing such a thing or at least not in the instances I had written. I've measured substantial improvements from such and similar utilization of CPU-level microarchitectural details. So perhaps I don't think it's the loop analysis only but I do think it's practically an impossible task for the compiler. Perhaps with the AI ...
It's quite telling that there is a #pragma omp simd to hint to a compiler to rewrite the loop.
Now I wonder what's the state of polyhedral compilers. It's been many years. And given the AI, LLMs hype they could really shine.