The Time's lawsuit does allege in part that 1. the training data was not licensed and therefore OpenAI has committed copyright violation, and 2. that the resultant model is a copy or derivative work of The Time's body of copyrighted articles.
That regurgitation is merely evidence of those two, and so putting a filter on the output explicitly does not resolve the case.
The question is whether legally you need a license to view copyright. Training doesn't copy anything, I think this is where people are confused. People assume this is how training works, because they have a false intuition about how LLMs must work.
LLM's are not data archives, I don't know how many times this has to be repeated.
> Defendants’ generative artificial intelligence (“GenAI”) tools rely on large-language models (“LLMs”) that were built by copying and using millions of The Times’s copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more.
They need to be confused because they need the judge to be confused too.
NYT isn't suing because LLMs will print out NYT articles for free. They are suing because LLMs are poised to be better/more favorable news reporters than them. It's a long term survival case, not a copyright one (despite that being the weapon used in the fight)
I agree. That said, if AI puts journalism out of business, then AI will quickly run out of content to train on and report. I think this is a situation where technology has gotten way out ahead of the law.
So if they fix this problem, you’d then be OK with generative AI?
BTW, plenty of humans have memorized copyrighted material, such as song lyrics. Do you think that should be prohibited? Maybe the difference isn’t as great as you think.
This is it, isn’t it? Thats why all the effort pours in to make these machines produce rembrandt and tolstoy copies. And why I still have to do my taxes by hand rather than the machine handling it with speed and accuracy.
It’s the core of it all— jealousy of a creative spirit.
Artists are not machines, but living souls.
If you were remotely open enough to see for yourself, then you wouldn’t struggle with engaging in the world in a creative manner and you wouldn’t feel that jealousy but encouragement by what you see pouring out of your fellow humans as a reflection of each other.
No machine will grant you that understanding, you just have to engage directly.
It will never succeed to supplant it, no matter the billions of dollars burned to try.
AI doesn't make art, it makes images - it's like a camera this way. Art is in the composition, the message and the aesthetic of the one using the tool to create an image.
Using an AI tool to create an image of a painting betrays the person who seeks to be “artist” by short circuiting the practice that leads the prospect to their path of enlightenment through mastery.
In our constricted 3d world there is no circumstance where an algorithmically generated image of a painting will equally serve the prospective artist in its procedural work on the prospective artist, internally. There is no other pursuit in art, and any pursuant will come to that conclusion in any number of ways but always through submission to the course of mastery (for which there is no shortcut).
Worse, the companies at the helm of this side of the technology are pushing it in order to stand middle man to humankind’s modus operandi-to create.
Keep your mind open to perspectives beyond the software industry.
I do think a new way of thinking about copyright is needed for AI. Allow tech firms to train on all material, but there should be an AI tax that serves as compensation back to the public commons for what was taken from it and privatized.
The status quo favors large players who can navigate the legal system.
Training on copyright content as research is entirely fair use. Just require that fair use defense to hinge on that research being public, i.e. that it's only a defense of open weight models.
A tax on AI is stupid because the big players can dodge taxes well and have the ear of power now, so any regulation would favor them. It would only serve to prevent challengers to their dominance.
The controversy here is that LibGen doesn't legally distribute its content. Mass-scale training on pirated content is... legally murky, to say the least.