Except it turns out not to work that way - most of the optimisations turn out to be general ones, and some of the non-optimised code paths AMD gets for stuff like string copying are slower than a naive implementation. (Actually, Intel wound up having to fudge the CPUID result on their newer processors for this reason. Otherwise binaries compiled on their older compilers would detect an unrecognised chip and run the slow path - including some of the benchmarkers reviewers would use to compare the two!)