Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To me it seems all the work of the same spammer(s). In such a case, do some manual intelligence and wrap it up. It won't scale to all forms of spam, but if a simple regex can uncover 250k+ results in 10 minutes, a manual spam fighter can still block millions of pages a day (and warn the webhost, remove these flakey ads from their networks, etc.).

No doubt the recent machine learning hype has given spammers more advanced tools to avoid detection.



False positives are far more problematic than false negatives...


If you remove from index... sure. But for that URL that I posted, do you think there is even a single false positive in there?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: