Hacker News new | past | comments | ask | show | jobs | submit login

Robots.txt is respected retroactively?



Yes, but it is only temporary. As long as you have a robots.txt file excluding some URLs, those URLs will: 1) not be crawled by the Internet Archive crawler, 2) not be shown in the Wayback Machine. Any already-crawled pages will, however, invisibly remain in the archive, and will reappear once they are not in the robots.txt anymore.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: