> The AI companies seek to train models in order to compete with the authors of the content used to train the models.
When I read someone else’s essay I may intend to write essays like that author. When I read someone else’s code I may intend to write code like that author.
AI training is no different from any other training.
> If a court determines that the AI output you've used is close enough to be considered a derivative work, it's infringement.
Do you mean the output of the AI training process (the model), or the output of the AI model? If the former, yes, sure: if a model actually contains within it it copies of data, then sure: it’s a copy of that work.
But we should all be very wary of any argument that the ability to create a new work which is identical to a previous work is itself derivative. A painter may be able to copy Gogh, but neither the painter’s brain nor his non-copy paintings (even those in the style of Gogh) are copies of Gogh’s work.
If you as an individual recognizably regurgitate the essay you read, then you have infringed. If an AI model recongnizably regurgitates the essay it trained on then it has infringed. The AI argument that passing original content through an algorithm insulates the output from claims of infringement because of "fair use" is pigwash.
> If an AI model recongnizably regurgitates the essay it trained on then it has infringed.
I completely agree — that’s why I explicitly wrote ‘non-copy paintings’ in my example.
> The AI argument that passing original content through an algorithm insulates the output from claims of infringement because of "fair use" is pigwash.
Sure, but the argument that training an AI on content is necessarily infringement is equally pigwash. So long as the resulting model does not contain copies, it is not infringement; and so long as it does not produce a copy, it is not infringement.
> So long as the resulting model does not contain copies, it is not infringement
That's not true.
The article specifically deals with training by scraping sites. That does necessarily involve producing a copy from the server to the machine(s) doing the scraping & training. If the TOS of the site incorporates robots.txt or otherwise denies a license for such activity, it is arguably infringement. Sourcehut's TOS for example specifically denies the use of automated tools to obtain information for profit.
I'm curious how this can be applied with the inevitable combinatorial exhaustion that will happen with musical aspects such as melody, chord progression, and rhythm.
Will it mean longer and longer clips are "fair use", or will we just stop making new content because it can't avoid copying patterns of the past?
> I'm curious how this can be applied with the inevitable combinatorial exhaustion that will happen with musical aspects such as melody, chord progression, and rhythm.
They did this in 2020. The article points out that "Whether this tactic actually works in court remains to be seen" and I haven't been following along with the story, so I don't know the current status.
More germane is that there will be a smoking gun for every infringement case: whether or not the model was trained on the original. There will be no pretending that the model never heard the piece it copied.
> AI training is no different from any other training.
Yes, it is. One is done by a computer program, and one is done by a human.
I believe in the rights and liberties of human beings. I have no reason to believe in rights for silicon. You, and every other AI apologist, are never able to produce anything to back up what is largely seen as an outrageous world view.
You cannot simply jump the gun and compare AI training to human training like it's a foregone conclusion. No, it doesn't work that way. Explain why AI should have rights. Explain if AI should be considered persons. Explain what I, personally, will gain from extending rights to AI. And explain what we, collectively, will gain from it.
When I read someone else’s essay I may intend to write essays like that author. When I read someone else’s code I may intend to write code like that author.
AI training is no different from any other training.
> If a court determines that the AI output you've used is close enough to be considered a derivative work, it's infringement.
Do you mean the output of the AI training process (the model), or the output of the AI model? If the former, yes, sure: if a model actually contains within it it copies of data, then sure: it’s a copy of that work.
But we should all be very wary of any argument that the ability to create a new work which is identical to a previous work is itself derivative. A painter may be able to copy Gogh, but neither the painter’s brain nor his non-copy paintings (even those in the style of Gogh) are copies of Gogh’s work.