As one person in the case said, all of the AI training companies out there, including Amazon's darling, Anthropic, are wholesale breaking copyright law.
If an individual downloaded Library Genesis to train an AI, there would be huge fines. OpenAI has admitted to training on a dataset that is scraped from LibGen.
> If an individual downloaded Library Genesis to train an AI, there would be huge fines.
Let’s be honest: Individuals are downloading Library Genesis to train AI already and not facing fines. It’s unrealistic to claim that these people are being fined for downloading LibGen or other copyrighted data for training AI.
> OpenAI has admitted to training on a dataset that is scraped from LibGen.
If they admitted this in the way you claim, surely publishers are gearing up for lawsuits.
Nobody is going to ignore a massively funded company breaking a law and instead pursue tiny fines against individuals.
So a massive copyright infringement suite against the big corporations is looming just beyond the horizon?
Either this causes copyright law to be re-written entirely or the complete dissolution of copyright law.
Wonder if there are any firms with enough stones to take that case lol. The amount of dirty laundry you can extract from these companies just in discovery would be huge.
> Either this causes copyright law to be re-written entirely or the complete dissolution of copyright law.
It's tempting to make the big case of the day "the most important of all time," but most likely this will just pile some fair-use carve-outs for commercial LLMs onto the Copyright Act of 1998 (which piled onto the Acts of 1976, 1909, 1870, 1831, and 1790).
I don't think that much is going to change, but I'm curious in the arguments for otherwise.
The act of training the AI on the material may be fair use, but the unauthorized access is a different story. That's probably why the Writer's Guild lawsuit against OpenAI focuses on the downloading.
Most of the arguments I've seen around this bullshit is that the AI is just like a human so it should be able to train on anything it can see. I don't think many understand what you've touched on here, which is that all the content piracy for explicitly commercial purposes is the actual problem here, not the training of AI.
If an individual downloaded Library Genesis to train an AI, there would be huge fines. OpenAI has admitted to training on a dataset that is scraped from LibGen.