The question is whether legally you need a license to view copyright. Training doesn't copy anything, I think this is where people are confused. People assume this is how training works, because they have a false intuition about how LLMs must work.
LLM's are not data archives, I don't know how many times this has to be repeated.
> Defendants’ generative artificial intelligence (“GenAI”) tools rely on large-language models (“LLMs”) that were built by copying and using millions of The Times’s copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more.
They need to be confused because they need the judge to be confused too.
NYT isn't suing because LLMs will print out NYT articles for free. They are suing because LLMs are poised to be better/more favorable news reporters than them. It's a long term survival case, not a copyright one (despite that being the weapon used in the fight)
I agree. That said, if AI puts journalism out of business, then AI will quickly run out of content to train on and report. I think this is a situation where technology has gotten way out ahead of the law.
LLM's are not data archives, I don't know how many times this has to be repeated.