It's fairly straightforward: TDD is red/green/refactor, which means you're working in very small steps (about a minute or two each) and thinking about design during two out of three of those steps.
During "red", you're thinking about the design of your public interface.
During "green," you're focusing on implementation.
During "refactor," you're thinking about how to improve the quality of your implementation and how to improve the overall design, and making those changes.
If you believe that spending a lot of time thinking about and improving your design will produce a high-quality design, then TDD will produce a high-quality design. QED.
If you don't accept the axiom, then it's a longer discussion, but that's the proof, and my experience is that it does in fact work.
(If you're looking for a rigorous study and proof, you won't find it, because there are no rigorous studies that formally prove what creates high-quality design. Partially because there is no formal definition of "high-quality design" in the first place.)
TDD is obviously a hill climbing strategy as far as the end result is concerned and the outcome has the associated baggage.
"Refactor" is named this way to emphasize that changes you are making are closely related to the tests you already have and the tests you are about to introduce.
That does not leave enough space to justify a QED. If you choose to design beyond that, the process stops being TDD - at least as described by Kent Beck
TDD has several big issues that lead to bad design:
1. It assumes your spec is good and rigid. In reality, most specs are shitty and fluid. And your first understanding of spec is wrong.
2. It assumes your first implementation of the spec is good enough to justify automated testing.
3. It leads to high test coverage which inhibits refactoring (despite zealots telling you otherwise).
4. Almost always it leads to obsession with testing, which leads to a ton of unnecessary complexity (e.g. dependency injection for the sake of testing, weird practices like "don't mock what you don't own", etc)
I always thought it's great that it forces you to think about public interface, but I came to believe that thinking cannot be forced with a ritual.
> 3. It leads to high test coverage which inhibits refactoring (despite zealots telling you otherwise).
The major flaw with TDD is if you get your test wrong, you get the wrong code. The intent is to inhibit refactoring, because the assumption is the tests are correct, so any refactoring must be done within the constrains imposed by those tests.
OFC the tests and supporting design are usually just as flawed as the code. This is why I say the first step of TDD is wrong, write your feature first, not your test. If you don't have a feature, and can't logically reconcile it with your other features, then its not worth even writing the test in the first place.
Tests are just a supporting tool once you (believe) you have the feature written, which functions on one hand to protect the other features you have written (at least as well as they are tested), and to validate that you aren't wildly breaking the system expectations. A large number of tests is a measurement that something is wrong, but it doesn't tell you if the feature itself is wrong or your design is wrong, just that one of the two is true.
That's how it helps you refactor, a "good" design will add new features and few tests will break, as more tests are added and total test failures approach zero over time, you gain some confidence that the system is good. You never gain certainty, just the knowledge that your constrained refactor probably didn't break anything.
During "red", you're thinking about the design of your public interface.
During "green," you're focusing on implementation.
During "refactor," you're thinking about how to improve the quality of your implementation and how to improve the overall design, and making those changes.
If you believe that spending a lot of time thinking about and improving your design will produce a high-quality design, then TDD will produce a high-quality design. QED.
If you don't accept the axiom, then it's a longer discussion, but that's the proof, and my experience is that it does in fact work.
(If you're looking for a rigorous study and proof, you won't find it, because there are no rigorous studies that formally prove what creates high-quality design. Partially because there is no formal definition of "high-quality design" in the first place.)