My ex-wife managed the security team at MySpace from about 2006 to 2008. The really wild part was when she went online to the MySpace hacker forums to see how the days’ work had gone. The insistence on allowing users to put HTML onto the site was a huge problem. These days, I think the solution would be to do a proper parse of the HTML input and remove forbidden attributes and tags, but back then it was handled via insanity with regexes.
It used to be that the only programs capable of somewhat correctly parsing HTML were web browsers, each one of them produced different results, most weren't open-source, and none were reusable as libraries. If you wanted to parse HTML in... looks up what MySpace was written in... ColdFusion, you were all out of luck. Since then people spent years developing specifications and writing the libraries, so now it's not a big deal.
You could identify that as not a valid tag in a single pass and know that you should escape the < and > on it.
For the implementation all the real HTML tags should be generated by the formatter and not originate from the original input. When formarring the valid tags get deleted from the input and everything else is properly HTML escaped.
As a primitive example imagine that the only HTML tags the formatter is able to output is <b> and </b> tags alongside HTML escaped text. That means it will be impossible for a script tag to ever be outputed by the formatter.
I wonder how many passes it needs at all. I mean, if you <scr<scr<scr<script>ipt>ipt>ipt> as many times as possible, you'll end up with a xss. Removing < and > at all would be the safest solution.