> No. It makes sense to use LLMs to generate tests. Even if their output matches the worst output the average human can write by hand, having any coverage whatsoever already raises the bar from where the average human output is.
Although this is true, it disregards the fact that prompting for tests takes time which may also be spent writing tests, and its not clear if poor quality tests are free, in the sense that further development may cause these tests to fail for the wrong reasons, causing time spent debugging. This is why I used the word "augment": these tests are clearly not the same quality as manual tests, and should be considered separately from manual tests. In other words, they may serve to elevate below average code or augment manual tests, but not more than that. Again, I'm not saying it makes no sense to do this.
> That's not the LLM's responsibility. Humans specify what they want and LLMs fill in the blanks. If today's LLMs output bad results, that's a reflection of the prompts. Garbage in, garbage out.
This is unlikely to be true, for a couple reasons:
1. Ambiguity makes it impossible to define "garbage", see prompt engineering. In fact, all human natural language output is garbage in the context of programming.
2. As the LLM fills in blanks, it must do so respecting the intention of the code, otherwise the intention of the code erodes, and its design is lost.
3. This would imply that LLMs have reached their peak and only improve by requiring less prompting by a user, this is simply not true as it is trivial to currently find problems an LLM cannot solve, regardless of the amount of prompting.
> Although this is true, it disregards the fact that prompting for tests takes time which may also be spent writing tests (...)
No, not today at least. Some services like Copilot provide plugins that implement actions to automatically generate unit tests. This means that the unit test coverage you're describing is a right-click away.
> (...).and its not clear if poor quality tests are free, in the sense that further development may cause these tests to fail for the wrong reasons, causing time spent debugging.
That's not how automated tests work. If you have a green test that turns red when you touch some part of the code, this is the test working as expected, because your code change just introduced unexpected changes that violated an invariant.
Also, today's LLMs are able to recreate all your unit tests from scratch.
> This is unlikely to be true, for a couple reasons: 1. Ambiguity makes it impossible to define "garbage", see prompt engineering.
"Ambiguity" is garbage in this context.
> . 2. As the LLM fills in blanks, it must do so respecting the intention of the code, otherwise the intention of the code erodes, and its design is lost.
That's the responsibility of the developer, not the LLM. Garbage in, garbage out.
> . 3. This would imply that LLMs have reached their peak and only improve by requiring less prompting by a user, this is simply not true as it is trivial to currently find problems an LLM cannot solve, regardless of the amount of prompting.
I don't think that point is relevant. The goal of a developer is still to meet the definition of done, not to tie their hands around their back and expect working code to just fall on their lap. Currently the main approach to vibe coding is to set the architecture, and lean on the LLM to progressively go from high level to low level details. Speaking from personal experience in vibecoding, LLMs are quite capable of delivering fully working apps with a single, detailed prompt. However, you get far more satisfactory results (i.e., the app reflects the same errors in judgement you'd make) if you just draft a skeleton and progressively fill in the blanks.
> That's not how automated tests work
> today's LLMs are able to recreate all your unit tests from scratch.
> That's the responsibility of the developer
> LLMs are quite capable of delivering fully working apps with a single, detailed prompt
You seem to be very resolute in positing generalizations, I think those are rarely true. I don't see a lot of benefit coming out of a discussion like this. Try reading my replies as if you agree with them, it will help you better understand my point of view, which will make your criticism more targeted, so you can avoid generalizations.
Although this is true, it disregards the fact that prompting for tests takes time which may also be spent writing tests, and its not clear if poor quality tests are free, in the sense that further development may cause these tests to fail for the wrong reasons, causing time spent debugging. This is why I used the word "augment": these tests are clearly not the same quality as manual tests, and should be considered separately from manual tests. In other words, they may serve to elevate below average code or augment manual tests, but not more than that. Again, I'm not saying it makes no sense to do this.
> That's not the LLM's responsibility. Humans specify what they want and LLMs fill in the blanks. If today's LLMs output bad results, that's a reflection of the prompts. Garbage in, garbage out.
This is unlikely to be true, for a couple reasons: 1. Ambiguity makes it impossible to define "garbage", see prompt engineering. In fact, all human natural language output is garbage in the context of programming. 2. As the LLM fills in blanks, it must do so respecting the intention of the code, otherwise the intention of the code erodes, and its design is lost. 3. This would imply that LLMs have reached their peak and only improve by requiring less prompting by a user, this is simply not true as it is trivial to currently find problems an LLM cannot solve, regardless of the amount of prompting.