Isn't web crawling essentially "training data" ?

jonny_eh · on Oct 5, 2023

Just because something is published on the web, it doesn't mean it's free of copyright protections.

TerrifiedMouse · on Oct 5, 2023

Nope. That just indexing, i.e. making an index.

That said, Google does cache other people's content and walks a fine line doing so - there is also that AMP thing. They probably haven't gotten sued because they don't affect those websites' ad revenue - don't know how that works.

sebzim4500 · on Oct 5, 2023

AMP is opt-in, isn't it? Presumably the website has agreed to their data being distributed in that way.