1. 90% of the time what happens during step 4 is realization that your tests are crap. So you have to throw them away and start from scratch. Tests are not helping with refactoring, they are inhibiting it.
2. Step 4 is indeed the most important part, and yet TDD priests and scriptures don't cover it at all. TDD is actually distracting you from what's important, because it focuses on steps 1,2,3. Eventually you'll become disciplined enough to not get distracted, and you'll think that TDD works. But the reality is that you never needed TDD in the first place.
I’ve used TDD as a kind of proxy for predicate transformer semantics. Test first is a way of sneaking specification first into a team’s workflow. But let’s be clear, overspecification is bad specification! And that’s why as you say a lot of times the tests get in the way: they are overly specific and thus inhibit desirable refactoring. In my experience it’s better to err on the side of underspecifying. Most everyone agrees with me on that, since the total lack of specification you generally see is a species of extreme underspecification. Don’t think I’m being snarky either, even that level of underspecification can be appropriate in the exploratory phase, although I think it’s generally worth the effort to start from a specification that’s at least slightly more restrictive than the always true predicate.
Yeah, that's how I think about testing in general. Tests should be a reflection of your specs. But if your specs are bad (most of them are), you should wait for them to improve. Not carve them in stone.
2. Step 4 is indeed the most important part, and yet TDD priests and scriptures don't cover it at all. TDD is actually distracting you from what's important, because it focuses on steps 1,2,3. Eventually you'll become disciplined enough to not get distracted, and you'll think that TDD works. But the reality is that you never needed TDD in the first place.