> Arguing that the second is a problem is much harder. Lenient HTML acceptance been hugely advantageous to the adoption of the web.
Wait, that's a completely different point. The argument here is that it caused a security vulnerability, and it did. If the lenient HTML parser didn't try to salvage HTML out of what is most certainly not valid HTML, then it wouldn't be a security vulnerability.
> This seems amazingly short-sighted and pointless. How was this issue not seen, and was this even solving a problem for someone?
You could ask the same of most cases of lenient HTML parsing. It's amazing the lengths browser vendors have gone to to turn junk into something they can render.
> So the issue here is incompetent input sanitisation.
No, it isn't. That code should not execute JavaScript. The real issue is that sanitising code is an extremely error-prone endeavour because of browser leniency – because you don't just have to sanitise dangerous code, you also have to sanitise code that should be safe, but is actually dangerous because some browser somewhere is really keen to automatically adjust safe code into potentially dangerous code.
Take the Netscape less than sign handling. No sane developer would think to "sanitise" what is supposed to be a completely harmless Unicode character. It should make it through any whitelist you would put together. Even extremely thorough sanitisation routines that have been worked on for years would miss that. It became dangerous through an undocumented, crazy workaround some idiot at Netscape thought of because he wanted to be lenient and parse what must have been a very broken set of HTML documents.
This is not a problem with incompetent sanitisation. It's a problem with leniency.
You have some compelling examples of problems from leniency. I think in some cases the issues are definitely magnified by other poor designs (bad escaping/filtering) but you've demonstrated that well-intentioned leniency can encourage and even directly cause bugs.
Wait, that's a completely different point. The argument here is that it caused a security vulnerability, and it did. If the lenient HTML parser didn't try to salvage HTML out of what is most certainly not valid HTML, then it wouldn't be a security vulnerability.
> This seems amazingly short-sighted and pointless. How was this issue not seen, and was this even solving a problem for someone?
You could ask the same of most cases of lenient HTML parsing. It's amazing the lengths browser vendors have gone to to turn junk into something they can render.
> So the issue here is incompetent input sanitisation.
No, it isn't. That code should not execute JavaScript. The real issue is that sanitising code is an extremely error-prone endeavour because of browser leniency – because you don't just have to sanitise dangerous code, you also have to sanitise code that should be safe, but is actually dangerous because some browser somewhere is really keen to automatically adjust safe code into potentially dangerous code.
Take the Netscape less than sign handling. No sane developer would think to "sanitise" what is supposed to be a completely harmless Unicode character. It should make it through any whitelist you would put together. Even extremely thorough sanitisation routines that have been worked on for years would miss that. It became dangerous through an undocumented, crazy workaround some idiot at Netscape thought of because he wanted to be lenient and parse what must have been a very broken set of HTML documents.
This is not a problem with incompetent sanitisation. It's a problem with leniency.