I do the opposite - I write the code and then ask copilot to create tests, which it gets 50-75% right, and I get to 100%. If I kept asking it to fix the code it'd never get there and would spiral off into greater and ever more verbose nonsense, but it does have its uses as a boilerplate generator.
The continuous deployment folks release into production if all the tests pass. That's what I was thinking about with that comment, really - it's not so much about the origin of the code, it's about tests-as-specification, and how much do we really trust tests?
I agree, in many cases the chatbot would just spiral round and round the problem, even when given the output of the tests. But... has anyone tried it? Even on a toy problem?
Not yet, but it's definitely something worth experimenting.
FWIW, there are precedents to this, e.g. Coq developers typically write the types and (when they're lucky) have "strategies" that write the bodies of functions. This was before LLMs, but now, some people are experimenting with using LLMs for this: https://arxiv.org/abs/2410.19605
...anyone been brave enough to release that code into production?