Hacker News new | past | comments | ask | show | jobs | submit login

Well yes, the whole point of robots.txt is that it's impolite to refuse to follow it, and doing so and getting caught might get you banned from the site.



That's a really good point. I'm not a fan of Internet Archive ignoring robots.txt, but if I'm really unhappy about it I can block their robot in Apache using .htaccess rules (as long as they continue using archive.org_bot as their user agent).




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: