> Then if the test fails, ask the LLM to fix the bug in the code.
If an LLM is intended to generate code, perhaps it's standing orders should include "Generate a comprehensive test suite, run it, fix the bugs, then iterate. Send me an email when you've finished."
I feel pretty convinced that that is the next iteration here, and would be quite disappointed if there aren't at least three companies working it right now. In fact maybe the correct solution is to generate 3 version of each function using 3 different LLMs and only return if at least 2 of them generate the same results, like back in the very early days of computers.
> and would be quite disappointed if there aren't at least three companies working it right now
Why does it need three companies working on it? Isn't this just a matter of prompt engineering?
I've never played with an LLM, so I really don't know. Perhaps it requires some ordinary linear code to produce a "comprehensive test suite".
Ultimately, the problem I can't see 3 or 300 companies solving, is the correct interpretation of technical instructions in English. Native English speakers have trouble with that, and I doubt that a machine can out-perform a native speaker. Maybe I'm wrong, and it can; but it needs to also be able to convince me that my doubt is misplaced.
Writing technical specs is like writing laws; you're using vague words to describe something precise. We don't use machines to interpret laws, and I don't trust machines to interpret specs.
If an LLM is intended to generate code, perhaps it's standing orders should include "Generate a comprehensive test suite, run it, fix the bugs, then iterate. Send me an email when you've finished."