I do the opposite - I write the code and then ask copilot to create tests, which it gets 50-75% right, and I get to 100%. If I kept asking it to fix the code it'd never get there and would spiral off into greater and ever more verbose nonsense, but it does have its uses as a boilerplate generator.
The continuous deployment folks release into production if all the tests pass. That's what I was thinking about with that comment, really - it's not so much about the origin of the code, it's about tests-as-specification, and how much do we really trust tests?
I agree, in many cases the chatbot would just spiral round and round the problem, even when given the output of the tests. But... has anyone tried it? Even on a toy problem?