Ask the same "AI" to create a machine readable proof of correctness. Or even better - start from an inefficient but known to be working system, and only let the "AI" apply correctness-preserving transformations.
I don’t think it’s that easy. I’m sure Intel, AMD and Apple have a very sophisticated suite of “known working systems” that they use to test their new chips, and they still build in bugs that security researchers find 5 years later. It’s impossible to test and verify such complex designs fully.