>just a simple case of bad code generation render little compiler into a toy one
If you find some time to go through gcc bugzilla you'll find shockingly simple snippets of code that miscompiled (often by optimization passes), with fixes never backported to older versions that production environments like RHEL are using.
I realized with all the rhel systems I’m using, we are never using default toolchains on them. Just use those old systems to run stuff, even newer toolchains.
I think a production grade compiler not only can, but must, leave performance on the table when the cost is correctness (unless the performance gain is incredibly high and the correctness loss is minimal). Correctness is not all important, but it is the most important thing. Unfortunately, compiler writers do not agree and they do silly things like "let's assume UB cannot ever happen and optimize based on that".
This is about optimizations affecting timing of cryptographic code, not correctness of computation, the argument for calling this a correctness bug in the compiler is quite weak I think.
I do not agree in the general case. There are very useful DSL compilers which do not consider performance at all, but just compile to a target which does the optimization for them (JVM, LLVM IR or even just C)
If you find some time to go through gcc bugzilla you'll find shockingly simple snippets of code that miscompiled (often by optimization passes), with fixes never backported to older versions that production environments like RHEL are using.