The problem to me is rather, will also Bing and Google limit their bot to site indexing. IMHO it just does not make sense to use multiple bots, however, robots.txt gives no syntax afaik to limit purpose?
This is particularly weird since the EU Datamining directive that got us into the mess inside the EU seems to suggest that robots.txt seems to be a valid means to retain copyright for data mining (there is no 'fair use' otherwise inside the EU). Are there other machine-readable standards? I further don't quite understand, how EU copyright relates to training a model outside the EU and using it within again (probably this is the biggest enforcement gap)
This is particularly weird since the EU Datamining directive that got us into the mess inside the EU seems to suggest that robots.txt seems to be a valid means to retain copyright for data mining (there is no 'fair use' otherwise inside the EU). Are there other machine-readable standards? I further don't quite understand, how EU copyright relates to training a model outside the EU and using it within again (probably this is the biggest enforcement gap)