Would be nice to have a regex for parsing HTML... *grabs popcorn*

bmn__ · on Jan 31, 2020

Easy with a sufficiently powerful engine: https://stackoverflow.com/a/4234491

Relies on ?(DEFINE): http://p3rl.org/perlre#(DEFINE)

quickthrower2 · on Jan 31, 2020

There is a good comment on that answer:

> To sum up: RegEx's are misnamed. I think it's a shame, but it won't change. Compatible 'RegEx' engines are not allowed to reject non-regular languages. They therefore cannot be implemented correctly with only Finte State Machines. The powerful concepts around computational classes do not apply. Use of RegEx's does not ensure O(n) execution time. The advantages of RegEx's are terse syntax and the implied domain of character recognition. To me, this is a slow moving train wreck, impossible to look away, but with horrible consequences unfolding

arkh · on Jan 31, 2020

With subroutines and recursive patterns I think you could do something parsing valid HTML.

Your sanity won't be left intact tho.

asicsp · on Jan 31, 2020

how about this "match "A B C" where A+B=C"[1] for sanity?

[1] http://www.drregex.com/2018/11/how-to-match-b-c-where-abc-be...

chirss · on Jan 31, 2020

boom. https://regex101.com/r/PxSY4U/1 technically it does parse it. :P

bmn__ · on Feb 1, 2020

Nope. <h1 class="foo>bar">My First Heading</h1> will misparse. (This is valid HTML 5.) You really need recursive regex or something equivalent in power, otherwise you will always fail.

chirss · on Feb 12, 2020

Well yea, it's a joke...

geongeorgek · on Jan 31, 2020

Haha..careful. someone might take this seriously