Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Isn't web crawling essentially "training data" ?


Just because something is published on the web, it doesn't mean it's free of copyright protections.


Nope. That just indexing, i.e. making an index.

That said, Google does cache other people's content and walks a fine line doing so - there is also that AMP thing. They probably haven't gotten sued because they don't affect those websites' ad revenue - don't know how that works.


AMP is opt-in, isn't it? Presumably the website has agreed to their data being distributed in that way.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: