From reading the PDF it seems that this ‘merely’ generates tests that will repea...

seanmcdirmid · on Feb 17, 2024

A lot of unit tests generated that way will simply be change detectors (fail when code changes) rather than regression tests (fail when bug is re-introduced). Those are pretty big distinctions, I don’t see LML’s getting here until they can ascertain tear correctness without just assuming good tests pass or depending on an oracle (the prompt will have to include behavior expectations somehow).

sixstringtheory · on Feb 17, 2024

This articulates the problem I’m having right now in an interesting way. I’m fine writing unit tests that validate business logic requirements or bug fixes, but writing tests that validate implementations to the point that they reimplement the same logic is a bit much.

I want to figure out how to count the number of times a test has had to change with updated requirements vs how many defects they’ve prevented (vs how much wall clock time / compute resources they’ve consumed in running them).

tsycho · on Feb 17, 2024

1. Define your APIs in terms of "what" it should do, not "how" (which is for the implementation).

2. Use protocols/interfaces in Swift/Java to define APIs.

3. Then write tests to the API's public contract, without using internal implementation details.

Tests written in the above way will actually detect bugs, and stay stable to internal implementation changes that don't affect the external behavior.

tvaughan · on Feb 17, 2024

Point 3 is the key. Code coverage should only be measured by tests that only use the “external API.”

sixstringtheory · on Feb 17, 2024

Brilliant distillation of this insight, I've never heard it put in those words before but it's perfect. It cuts both ways too, if you have lots of tests but most of them aren't really exercising the external API, then you're worse off.

misja111 · on Feb 17, 2024

> I want to figure out how to count the number of times a test has had to change with updated requirements vs how many defects they’ve prevented

I did the same some years back in a project that had both a unit test suite with pretty high code coverage, and a end to end suite as well. The results for the unit test suite were abysmal. The number of times they caught an actual regression over a couple of months time were close to zero. However the number of times they failed simply because code was changed due to new business requirements was huge. With other words: they provided close to zero value while at the same time having high maintenance costs.

The end to end suite did catch a regression now and then, the drawback of it was the usual one, it was very slow to run and maintaining it could be quite painful.

The moral of the story could have been to drastically cut down on writing unit tests. Or maybe write them while implementing a new ticket or fixing a bug, but throwing it away after it went live. But of course this didn't happen. It sort of goes against human nature to throw away something that you just put a lot of effort in.

planetjones · on Feb 17, 2024

That’s what I believe Facebook have created here, so you’re right ‘regression’ is a big word - the tests are more likely detecting change e.g. by asserting the existing behaviour of conditionals previously not executed.

Ma8ee · on Feb 17, 2024

And it will lock the system into behaviour that might just be accidental. The value of tests is to make sure that you don't break anything that anyone cares about, not that the every little never used edge case behaviour, which might just an artefact of a specific implementation, is locked in forever.

jeffreygoesto · on Feb 17, 2024

This is my experience as well. The problem is that persisting "but what _shall_ it do?" on a low level is seen as redundant, as long as everything works. Typically forgotten edge cases are detected elsewhere. The metric _that_ you ran past those code lines says nothing about that you came there for the right reason.