Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd imagine they're in a dodgy copyright situation and so guard against it by being conservative wrt robots.txt.

The robots.txt shows a positive assertion that parts of a site should be excluded from being used by automated systems.

In most cases I imagine WBM does not have permission of the owner to keep a duplicate of the site, it's certainly tortuous in UK law.

Sites that don't change their robots.txt are probably highly correlated with sites that don't sue for the infringement.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: