My ex-wife managed the security team at MySpace from about 2006 to 2008. The rea...

orenlindsey · on Jan 26, 2024

They seriously tried to parse HTML with regex? That's crazy.

MatmaRex · on Jan 27, 2024

It used to be that the only programs capable of somewhat correctly parsing HTML were web browsers, each one of them produced different results, most weren't open-source, and none were reusable as libraries. If you wanted to parse HTML in... looks up what MySpace was written in... ColdFusion, you were all out of luck. Since then people spent years developing specifications and writing the libraries, so now it's not a big deal.

dlnovell · on Jan 26, 2024

https://stackoverflow.com/a/1732454/378171

sebmaynard · on Jan 27, 2024

Long live Tony the Pony.

charcircuit · on Jan 26, 2024

They were using regex to block bad input without needing to parse HTML.

samatman · on Jan 27, 2024

hope they were using more than one pass....

    <scr<script>ipt>

charcircuit · on Jan 27, 2024

You could identify that as not a valid tag in a single pass and know that you should escape the < and > on it.

For the implementation all the real HTML tags should be generated by the formatter and not originate from the original input. When formarring the valid tags get deleted from the input and everything else is properly HTML escaped.

As a primitive example imagine that the only HTML tags the formatter is able to output is <b> and </b> tags alongside HTML escaped text. That means it will be impossible for a script tag to ever be outputed by the formatter.

MatmaRex · on Jan 27, 2024

You can read about some things they did, and didn't! https://samy.pl/myspace/tech.html

SeriousM · on Jan 27, 2024

I wonder how many passes it needs at all. I mean, if you <scr<scr<scr<script>ipt>ipt>ipt> as many times as possible, you'll end up with a xss. Removing < and > at all would be the safest solution.

paulpauper · on Jan 26, 2024

Ppl were coding up xss back in the day on Myspace to spread ringtone offers

hot_gril · on Jan 26, 2024

Innocent users getting pwned aside, that sounds fun, an anarchy website in Windows XP days.