I honestly don't understand how people can refuse the idea of that some parts most certainly can be reviewed with the help of a computer. There wasn't a "how good is this book" score because a computer might not be able to tell that yet, I don't understand the issue in looking at the number of adverbs in a book with the help of a computer.
Software people are intensely susceptible to the McNamara Fallacy:
“But when the McNamara discipline is applied too literally, the first step is to measure whatever can be easily measured. The second step is to disregard that which can't easily be measured or given a quantitative value. The third step is to presume that what can't be measured easily really isn't important. The fourth step is to say that what can't be easily measured really doesn't exist. This is suicide.”
— Daniel Yankelovich, "Interpreting the New Life Styles", Sales Management (1971)
Is there a counter fallacy along the lines of "if we can't measure everything, we shouldn't measure anything"? It can still be an interesting, fun and informative exercise while keeping in mind that it may be limited.
> I don't understand the issue in looking at the number of adverbs in a book with the help of a computer.
I think that authors see an issue with their IP being fed to an algorithmic black box that will certainly make a lot of money, but authors won't see any of it, and for which engineers say "don't worry, it is not stealing your IP, just let us make money the way we want and shut up".
What if OpenAI could prove that their algorithms are not stealing IP? I think it's easy: they just can't. They don't even formally know what their algorithm can and cannot do.
If I was an author or an artist, I would most definitely want a new kind of legal way to make sure that if I don't want my work to be used as training data, then it most definitely isn't. Of course that's not really possible, because who can audit the algorithmic black box and see if my work was used in the training?
How was the tool in question, the tool the book authors were outraged about, "stealing their IP"? It's also unclear how it would make any money at all -- indeed, the tool's author even said it made no money.
I believe that the authors are not fighting against that tool in particular. They are fighting about generative AIs being trained with their material without their consent.
Which is totally legit to me. Maybe this tool is just a collateral damage of a much bigger debate, I don't know. The fact is that in the bigger debate, it does feel like engineers don't seem to care much about the artists. Why would the artists care about individual engineers and individual use-cases?