The problem there is outputting the data, not inputting it. OAI can put a dumb I...

lelandfe · 2025-01-12T16:43:27 1736700207

The Time's lawsuit does allege in part that 1. the training data was not licensed and therefore OpenAI has committed copyright violation, and 2. that the resultant model is a copy or derivative work of The Time's body of copyrighted articles.

That regurgitation is merely evidence of those two, and so putting a filter on the output explicitly does not resolve the case.

Workaccount2 · 2025-01-12T17:25:27 1736702727

The question is whether legally you need a license to view copyright. Training doesn't copy anything, I think this is where people are confused. People assume this is how training works, because they have a false intuition about how LLMs must work.

LLM's are not data archives, I don't know how many times this has to be repeated.

lelandfe · 2025-01-12T17:59:23 1736704763

The NYT and their lawyers are confused, then:

> Defendants’ generative artificial intelligence (“GenAI”) tools rely on large-language models (“LLMs”) that were built by copying and using millions of The Times’s copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more.

https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...

This case is still ongoing a year later.

Workaccount2 · 2025-01-12T19:16:45 1736709405

They need to be confused because they need the judge to be confused too.

NYT isn't suing because LLMs will print out NYT articles for free. They are suing because LLMs are poised to be better/more favorable news reporters than them. It's a long term survival case, not a copyright one (despite that being the weapon used in the fight)

munchler · 2025-01-12T21:02:47 1736715767

I agree. That said, if AI puts journalism out of business, then AI will quickly run out of content to train on and report. I think this is a situation where technology has gotten way out ahead of the law.