Unit tests can give you false positives (test failed but code is correct) and false negatives (test passed but code failed).
And TDD seems to create so many tests that you get huge false positive rates. I recently jumped on a project and I made a couple of fairly small code changes (a couple of hours) which caused 100 tests to fail. I then spent the next two days going through and correcting all 100 tests none of which found an issue in my code.
If you're saying that it's possible to do testing badly, I agree, just like it's possible to write production code badly. Sometimes teams new to unit testing do it ritualistically, without really understanding the purpose. That can lead to all sorts of bad outcomes. E.g., lots of tests that look impressive and even generate good coverage numbers, but don't really test what matters. Or tests that are highly duplicative, such that changing one thing in the code requires changing a lot of things in tests.
I have definitely dealt with code bases like that, and that sucks. But I have also dealt with code bases where the tests were great, and that's an amazing experience.
To do TDD well, I think it's important to release early and often and to reflect on one's experience (e.g., with weekly team retrospectives). That way if people are doing something unhelpful, like writing very duplicative tests, pretty soon they'll become an impediment to progress. The team will learn to write the useful tests, while skipping the ones that might fit some hypothetical pattern. It also helps people learn to design for testability; often, painful tests are a sign of bad design of the production code.
- Focus on "automated testing", don't get obsessed with philosophising about "the true nature of a 'unit'", or other such dogma.
- Be empirical: base your rules on what works; don't base your work on "the rules".
- The goal of testing is to expose problems in our program: "test failure" is a success, because we've found a problem (even if that problem is with the test!). Anything else is secondary (e.g. isolating the location of failures, documenting our API, etc.). Avoiding this goal defeats the point (e.g. choosing to ignore edge cases).
- Focus on functionality rather than implementation details, e.g. 'changing a user's email address' rather than 'the setEmail method of the User class'. This improves reliability and makes failures more useful/meaningful (i.e. "this feature broke" vs "this calling convention has changed").
- Mocking is a crutch: it works-around problems that can usually be avoided entirely during design; it can still be very useful when a design can't be changed (e.g. adding tests to a legacy system).
- Testing a real thing is objectively better than testing a fake thing; we should only mock if testing the real thing is unacceptable.
- If two components always exist together, pretending that they're independent is a waste of time and complexity.
- Having some poor tests is better than having no tests. Tests can be added, removed and improved over time, just like anything else.
- "Property checking" is a quick way to find edge-cases and scenarios we wouldn't have thought of.
- Fast feedback loops are important. Reducing I/O and favouring pure calculation usually speeds up testing more than reducing the number or size of tests (e.g. "unit" vs "end-to-end"). Incidentially, this is also how we avoid having to mock.
The type of engineers who would screw up 100 unit tests independently are exactly the kind of engineers who should be forced to write tests for their code. Can you imagine the integration tests had they not been doing any testing at all?
I don't think so. They probably could have been written better, they weren't written poorly, but it's really hard to write 200 unit tests for a feature that don't break when the feature is updated.
And TDD seems to create so many tests that you get huge false positive rates. I recently jumped on a project and I made a couple of fairly small code changes (a couple of hours) which caused 100 tests to fail. I then spent the next two days going through and correcting all 100 tests none of which found an issue in my code.