Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If a site allows crawling by at least one public-Internet spider, is there any legal protection it has against other crawlers who choose to ignore robots.txt and crawl anyway? Because I feel like that's exactly what Google would do here, as long as it wasn't literally illegal for them to do so.


It's kinda unclear at the moment, but it's working its way through the courts! See HiQ Labs v. LinkedIn [1] in which HiQ was scraping public profiles and was blocked from doing so by LinkedIn. This made its way to court and the 9th circuit ruled they were allowed to scrape but SCOTUS later rejected the decision and sent it back. So—murky!

[1] https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn


You can still block googles ip’s manually




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: