If we want to use data owned by others and make money with it, we can do two things:
(1) just grab the data
(2) ask the content owners
I think what is fair is closer to (2) than to (1). Especially since the data was originally intended for human consumption. What you call "training" is what another person might call "mechanized processing", and would not fall within fair use of the data.
I'm honestly at a loss here. I can't figure out what your position is.
> If we want to use data owned by others and make money with it [...] ask the content owners
So is it "no commercial use without permission" you're arguing for?
> mechanized processing
Or are you arguing that training should fall under the existing mechanical license provisions for songs? I don't think you are, because those licenses are compulsory, and you seem to want an element of choice for the copyright holder.
Ok, put the chatbots aside for the moment. If [brand new use] for a book is invented, and I buy a copy of that book and want to do [that new thing] with it, should the copyright holder of that book be able to block me?
This would automatically outlaw any new use of information (eg music sampling) by default.
If all novel uses were banned from the outset, cultural progress would suffer immeasurably.