HTML parsing is not *that* hard compared to CSS/layout/fonts (or even figuring o...

jfk13 · on June 13, 2019

Does that HTML parser follow all the HTML5 parsing/error-handling rules, so that it conforms to the spec's behavior for random tag soup full of broken markup? Or are you assuming "clean" HTML?

tannhaeuser · on June 14, 2019

No, it follows the normative description of HTML as specified in chapter 4 of the HTML spec. The redundant procedural spec for parsing HTML is strictly aimed at browser implementers, and in particular to reach same behaviour accross browsers in the presence of errors. Note that the covered fragment still contains the rich tag omission/inference rules for HTML and other minute details, based on formal SGML techniques, though.