> To sum up: RegEx's are misnamed. I think it's a shame, but it won't change. Compatible 'RegEx' engines are not allowed to reject non-regular languages. They therefore cannot be implemented correctly with only Finte State Machines. The powerful concepts around computational classes do not apply. Use of RegEx's does not ensure O(n) execution time. The advantages of RegEx's are terse syntax and the implied domain of character recognition. To me, this is a slow moving train wreck, impossible to look away, but with horrible consequences unfolding
Nope. <h1 class="foo>bar">My First Heading</h1> will misparse. (This is valid HTML 5.) You really need recursive regex or something equivalent in power, otherwise you will always fail.
grabs popcorn