Hacker News new | past | comments | ask | show | jobs | submit login

Meh. I'm pretty ambivalent about voluntary restrictions like robots.txt. As far as I'm concerned it's mostly useful as a way for site operators to document endless dynamic content or requests that are prohibitively expensive (but not so much that they restrict access).

I figure if it's on the web and a human can read it, my computer ought to be able to read it too.




Yes, but should your computer also be allowed to disseminate that content without the original author's permission?


A robots.txt file is not any sort of license (anti-license?). It's existence has no bearing on the question of if the IA should be allowed to do what it does. It is only intended to provide helpful information to web crawllers.

* http://www.robotstxt.org/norobots-rfc.txt


Well yes, the whole point of robots.txt is that it's impolite to refuse to follow it, and doing so and getting caught might get you banned from the site.


That's a really good point. I'm not a fan of Internet Archive ignoring robots.txt, but if I'm really unhappy about it I can block their robot in Apache using .htaccess rules (as long as they continue using archive.org_bot as their user agent).


How is this any different from a human doing the same? The internet is meant to be open, it's free information after all. If you don't like it put your stuff behind a login


I see it as a subtle difference of automation and scale, plus the fact that the Internet Archive is not just saving these copies, but also making them available.

Imagine standing on the public road and taking a picture of your neighbor's home (or face) for your own use. Is that the same as a large company taking pictures of all homes (or faces) of the world, and making them available to the entire world, forever?


Tons of companies take satellite views of the entire earth and they're freely available online.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: