Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The problem there is outputting the data, not inputting it.

OAI can put a dumb IP filter on ChatGPT output and resolve the case. Training plays no part here.



The Time's lawsuit does allege in part that 1. the training data was not licensed and therefore OpenAI has committed copyright violation, and 2. that the resultant model is a copy or derivative work of The Time's body of copyrighted articles.

That regurgitation is merely evidence of those two, and so putting a filter on the output explicitly does not resolve the case.


The question is whether legally you need a license to view copyright. Training doesn't copy anything, I think this is where people are confused. People assume this is how training works, because they have a false intuition about how LLMs must work.

LLM's are not data archives, I don't know how many times this has to be repeated.


The NYT and their lawyers are confused, then:

> Defendants’ generative artificial intelligence (“GenAI”) tools rely on large-language models (“LLMs”) that were built by copying and using millions of The Times’s copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more.

https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...

This case is still ongoing a year later.


They need to be confused because they need the judge to be confused too.

NYT isn't suing because LLMs will print out NYT articles for free. They are suing because LLMs are poised to be better/more favorable news reporters than them. It's a long term survival case, not a copyright one (despite that being the weapon used in the fight)


I agree. That said, if AI puts journalism out of business, then AI will quickly run out of content to train on and report. I think this is a situation where technology has gotten way out ahead of the law.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: