> Failing tests indicate the presence of bugs, but passing tests do not promise their absence.
As somebody who primarily lives on the testing side of the house, I've definitely run into cases where the developer promises that their unit tests will make a new feature less buggy, then about 5 minutes later I either find a mistake in the test or I find a bug in something that the developer didn't think to test at all.
I've also seen instances where tests are written too early, using a data structure that gets changed in development, and then causes churn in the unit tests since now they have to be fixed too.
I've generally come to think that unit tests should be used to baseline something after it ships, but aren't that useful before that point (and could even be a waste of time if they take a long time to write). I don't think I'll ever be able to convince anybody at my company about this though lol
Tests are akin to scientific experiments. They test hypothesis and try to falsify claims. They shouldn't be seen as ground truth, but ways to gain information about what the system claims to be doing. In this sense it makes sense that tests will become obsolete or evolve with the system, because the model and domain upon which the system is based also evolves and changes with time.
This is why I'm not sure more languages don't instil the idea of "public" and "private" tests like Go does.
Your "public" tests should document the API for future programmers. This is the concrete contract that should never change, no matter what happens to the implementation. If these tests break, you've done something wrong.
Your "private" tests are experiments that future programmers know can be removed if they no longer fit the direction of the application.
Unpopular opinion, but I always say that unit tests are contracts for the API. If you don't want / don't need to make contract, don't do unit tests.
Unit tests main purpose is not to improve code or reduce bug, it's main purpose is to verify the code to work against the contract that's defined in unit tests. Code improvement or bug reduction are added benefit, if any.
Unit tests testing the API - if there is no API there then the level of unit it probably too low. The purposes of all tests is to say "No matter what this will never change" - while never is a bit to strong as you are allowed to make changes any API that your unit tests covers will be painful to change, both because the tests will also have to change and so you have nothing to guide you, and also because odds are you don't have good coverage from other tests (integration tests would catch issues, but you rarely have all cases covered).
Or to put it a different way, your unit tests should cover units that have a good boundary to the rest of the system. This should sound like a module, but there is reason to have module as a larger thing than your unit (most of the time there shouldn't be, but once in a while this is useful), and so while there is overlap it is often useful to consider them different.
Integration tests cover the API, but they do not test the API (well they often use some API as well, but the won't cover all your internal APIs.)
API doesn't imply integration. Consider any module or package to have an API exposed to the user of the package. Unit tests should assert that the package behaves as expected.
By users I assume you mean other developers who use the API (including you next week). It would be better to use a different term as often user means "end user" or "customer" and not internal users.
Yeah, I'm not sure what the better word is. As a programmer, I use APIs. "interact with", "code to/against". I think "user" is OK. We're all users at different levels of abstraction.
API is accesible public interface, I feel like I’ve seen it used to talk about accessible “to the public/other teams” from a service standpoint but not to describe any sundry public methods of a file.
> I've also seen instances where tests are written too early, using a data structure that gets changed in development
That's as clear a signal that they are testing the wrong interface as you can get.
Unfortunately, developers think of tests as testing code, not interfaces. As a natural consequence, they migrate towards testing the most complex break-down of their code as they can; it increases the ratio of code coverage / number of tests... at the cost of functionality coverage.
There's two kinds of tests I think developers are trying to write and I think both of them have merits. Writing a test for the code is actually totally fine so long as the reason that you're doing it is because you want to be able to depend on that and you need something to scream if ever changes.
I think starting with a test that the code does what the code does is actually a pretty good starting point because it's mechanical. And if you never end up revising that code, it can just live there forever, but if you do end up revising the code, those tests will slowly morph overtime into testing the interface. When you actually do your revision, you get a very clear signal that like hey this test that is now failing as a result of the change, clearly that can't be the important part.
Waiting for the change to see what stays the same I think is often more accurate than trying to guess what the invariants are ahead of time.
I get the impression you are talking about taking over legacy code.
Well, the invariants you want to test are what people want the software to do. If you create those test from the beginning, there's nothing to guess. But of course, if people write a bunch of code and throw that knowledge away, you have to recover it somehow, and it will necessarily involve a lot of guessing.
I inherited an ancient project that had literally tens of thousands of test.
I reviewed hundreds of them, tried rewriting dozens of them.
Eventually, realize that essentially all of the tests were just testing that mock data being manually manipulated by the test gave the result of test expected.
Absolutely nothing useful was actually being tested.
Some team spent a couple of years writing an unholy number of tests as a complete waste of time. Basically just checking off a box that code had tests.
The first box on the testing checklist states that the test should first fail. I wonder how they managed to not test anything while seeing the tests transition from failure to success.
From the GP: mock data being manually manipulated by the test gave the result of test expected.
These tests are easy to write - your mock returns something, and then you verify that the API does nothing (thus the test fails), then returns whatever the mock does and the test passes. These tests are easy to write and they do fail until code is written. However they are of negative value - you cannot refactor anything as the code only calls a mock, and returns some data from the mock.
I can't imagine writing a function that is nothing more than an identity function would be easy to write (unless it was explicitly intended to be an identity function, I suppose). There must be some terrible gut wrenching feeling that goes along with it, if nothing else? Frankly, I don't understand how this situation is possible in practice, but perhaps I misunderstand what is written?
I've never seen that but I've heard the claim before.
So . . . is the team of devs who spent years writing and maintaining those tests incompetent or is it the new dev with the complaint? If it was the whole team, how did that happen?
> something that the developer didn't think to test at all.
Raising hand, guilty as charged. I test for things based in my concept of how the system works, but those darn users may have other ideas!
Actually, when I found a user who seemed to have a knack for finding bugs, they were gold and I let them know I appreciated their efforts.
> unit tests should be used to baseline something after it ships
That has not been my experience. I found that unit tests let me get the pieces working properly so that when assembled, the chances that everything worked as expected were much improved.
My gift to any team I work with is an integration test harness that usually has some kind of DSL for setting up state. This looks wildly different depending on the project. But my theory is that if tests are easy to write then it is easy to make more of them. So it is worth it to write some ugly code one time under the hood to make this happen.
If every test requires copy pasting a bunch of sql statements and creating a new user and data, my experience is the team will have 3-4 of these kinds of tests. But if the test set-up code looks like `newUser().withFriends(3).withTextPost(“foo”).withMediaPost().sharingDisabled().with…` then the team is enabled to make a new integration test any time they think of an edge case.
we have _exactly_ that (down to the `withXXX` syntax), and indeed I found it great.
Downside is that test setup can be a bit slow: each `withXXX` can create more data than really necessary (eg a "withPost()" might update some "Timeline" objects, even though you really don't care about timelines for your test). Upside is that it's a lot closer to what happens in reality, regularly finding bugs as side-effect. And also you align incentives: you make your tests faster my making your application faster.
When a codebase gets too big, and devs gets too clever with their tests, the whole test suite becomes complicated.
If your test suite is approaching the complexity of the actual codebase (what with layers of mocks and fixtures that are subtly interdependent,) how could you be expected to trust a test you wrote more than the code you wrote.
I (sysadmin/devops) am writing some nodejs and the complexity of the tests is confusing to me. I'm a nodejs beginner to be sure, and I'm not experienced enough to verify what copilot gives me.
All those mocks, and other Jest code, all seem overly complicated but I don't know of anything "better".
Don't be ashamed of this. I am a veteran of automated testing (writing automated unit tests for 10+ years). I am constantly disappointed in myself how bloody complicated it is to test some "simple code". And, then I go back and look at old unit tests from a few months ago: "Who wrote this shit!? <git blame> Oh, me."
> I've generally come to think that unit tests should be used to baseline something after it ships, but aren't that useful before that point
I disagree, but not entirely. I think there's a balance. Tests can be a great way to execute code in isolation with expected inputs/outputs and can help dramatically in absence of other ways to execute the code. But in general, I mostly agree. Tests are mostly valuable as a way of ensuring you don't break something that was previously working, but they are still valuable for validating assumptions as you go.
> I've generally come to think that unit tests should be used to baseline something after it ships
In theory, but I've never seen anyone successfully write tests after something ships. By that time much of the context that should be documented in tests is forgotten.
I've done a handful of tests over the years. Once in a while the original didn't have any tests and I had no confidence I could change it without writing tests. Once in a while the original was buggy and after getting tired of going back top fix bugs I wrote a few tests. Once I had a case where the fix for bug A introduced by B, which the obvious fix was revert this code that made no sense thus bringing back bug A - when someone realized this was happening every year we write a few tests just to stop that pattern.
The above those is a very rare exception. The general rule is once code is shipped management doesn't allow you time to make it better.
I want to be make sure I understand your message here. When I write unit tests, I frequently find bugs in my own code. Are you talking about something different?
It is just the cost of good quality. This is like suggesting you shouldn't write error handling code, because the code might change and have different errors that need to be handled.
Also if the interface doesn't change but your unit tests fail on a data structure change then perhaps your tests are too coupled.
Why would anybody doing TDD have tests highly coupled with the implementation? They shouldn't have this problem at all.
The GP diagnostic isn't good here. Those are bad tests, not tests written too early. What doesn't mean you shouldn't wait for your functionality to be accepted before testing it; some times you should; but this happens for completely different reasons.
By using TDD, which promotes isolating the changeable surface area to a small area during discovery. That way you don't have to introduce the complexities of API changes across the rest of the application surface area, avoiding the churn spoken of earlier.
Most of the time I already know what I'm going to write. Thus most of the time I can start with a simple test and it isn't too early. It is rare to be presenting with a problem in code that you don't know if it is solvable or how to solve it and jump right into code (as opposed to research, or white board discussions), which then gets enough in place that it isn't too early.
As somebody who primarily lives on the testing side of the house, I've definitely run into cases where the developer promises that their unit tests will make a new feature less buggy, then about 5 minutes later I either find a mistake in the test or I find a bug in something that the developer didn't think to test at all.
I've also seen instances where tests are written too early, using a data structure that gets changed in development, and then causes churn in the unit tests since now they have to be fixed too.
I've generally come to think that unit tests should be used to baseline something after it ships, but aren't that useful before that point (and could even be a waste of time if they take a long time to write). I don't think I'll ever be able to convince anybody at my company about this though lol