Sometimes I just feel like these people overestimate how much they are actually ...

logifail · 2025-01-19T18:19:40 1737310780

> It’s trained on 15T tokens. So how many did you provide that were genuinely novel?

Are we suggesting that we should ditch creators' rights and instead value intellectual property along the lines of "I should be able to copy all your stuff as long as long as I copy lots of other stuff too, and give it all away for free or almost free...?"

marssaxman · 2025-01-19T18:24:42 1737311082

That sounds like a pretty good deal to me - but I've always believed that the entire concept of "intellectual property" does overall more harm than good.

logifail · 2025-01-19T18:32:53 1737311573

> I've always believed that the entire concept of "intellectual property" does overall more harm than good

It's fairly broken, but on balance it seems the creators are the ones getting screwed.

I did years of research in a scientific lab which resulted in <drum roll> 2 (yes two ... count them) peer-reviewed papers.

My colleagues and I did the work, wrote up the damned papers, yet to get them published we had to sign over copyright to what I'd now suggest is essentially a rent-seeking scientific publishing mafia.

All a long time ago, but I never had (and still don't have) the ability to either legally download or legally redistribute my own work...

llm_trw · 2025-01-19T18:22:57 1737310977

The question if training an llm is fair use is one that will have to be answered by the courts.

yencabulator · 2025-01-19T18:47:44 1737312464

The point (often) is to stop the practice, not to ask money for it.

Like, you get fined for speeding, but if you keep speeding you'll get you're license revoked, and if you keep driving after that you get jail time. The payment required is punitive, but the point is to stop you.

llm_trw · 2025-01-19T18:53:09 1737312789

Again, that an opinion that will need to be tested in the courts.

You will have to explain why Hunter Thompson copying every word of every Hemingway novel isn't copyright infringement, but a computer doing the same is.

yencabulator · 2025-01-19T19:02:52 1737313372

Three things:

1. Humans are not machines; arguments saying because a human can learn LLMs must be allowed to copy is not interesting.

2. Did Thompson publish the work? It sounds like you're referring to an activity Thompson did in private, to improve his skills as an author. Meanwhile, lawsuits are alleging that LLM services reproduce copyrighted materials.

3. What can be fair use at small scale is no longer fair use at large scale.

llm_trw · 2025-01-19T19:09:02 1737313742

If you think that pretraining is copying then your opinions are irrelevant.

yencabulator · 2025-01-19T19:14:36 1737314076

If you think courts follow your personal opinion and technical definition, you're silly. The reality is we don't yet know how the courts will decide.

logifail · 2025-01-19T19:05:24 1737313524

> Hunter Thompson copying every word of every Hemingway novel isn't copyright infringement

I make a high-res photo of a banknote. I can print that out at home. The bad stuff starts at the step after that...

llm_trw · 2025-01-19T19:07:38 1737313658

Try importing it in photoshop and report your findings back.

logifail · 2025-01-19T19:26:52 1737314812

No-one would possibly think of using GIMP instead...

https://www.reddit.com/r/graphic_design/comments/ah9s8n/trie...

ripped_britches · 2025-01-19T18:21:30 1737310890

No not at all, just that their damages are going to be measurably fairly low

scotty79 · 2025-01-19T18:25:44 1737311144

Of course. The alternative is that creators dictate the price for any of the infinite number of zero cost copies which is and always has been ridiculous.

jncfhnb · 2025-01-19T18:25:01 1737311101

Yes. As long as you don’t reproduce other people’s stuff specifically.

latexr · 2025-01-19T18:34:29 1737311669

Which they do. That’s what the New York Times lawsuit is about. And in Meta’s case, they went specifically out of their way to remove the copyright notices to hide their actions.

jncfhnb · 2025-01-20T14:58:14 1737385094

“They” might be doing that. But this is not intrinsic to LLM usage

latexr · 2025-01-20T18:23:43 1737397423

The conversation is specifically about OpenAI and Meta.

jncfhnb · 2025-01-20T23:04:05 1737414245

I disagree. This thread seems to be about LLMs on a fundamental level.

latexr · 2025-01-21T14:40:24 1737470424

The submission is about Meta, and the comment that started the thread specifically mentioned OpenAI and Meta and no other LLMs or providers.

jncfhnb · 2025-01-21T15:47:25 1737474445

And the thread is about how LLM training interacts with copyright. Not whether OpenAI or Meta coincidentally blatantly copied other works.

tomrod · 2025-01-19T18:32:05 1737311525

I think the statistical arguments cover this.

skulk · 2025-01-19T18:25:47 1737311147

> I personally hope we can all get on the same team with AI and treat its advancement as scientific research for the betterment of humanity.

s/AI/capital/.

It's painfully obvious that this is going to make material conditions worse for most people who use their minds to work instead of their hands. to these people, the "betterment of humanity" is a cruel joke.

idunnoman1222 · 2025-01-19T18:30:15 1737311415

Yeah, just like Google and stack overflow

wat10000 · 2025-01-19T18:35:15 1737311715

The normal way to figure this out is to negotiate. We’ll either come to a mutually agreeable amount, or they’ll decide it’s not worth the cost to use my stuff. If I think I deserve $5 from OpenAI, then I’d suggest that, and they’d accept or come back with a counteroffer or tell me I’m nuts and move on. Probably that last one.

But for some reason, these companies think they don’t need to bother, and can just use everyone’s stuff.

Wait, I phrased that wrong. For a very good reason based on long precedent, these companies know that IP law is a tool to be used by big companies against individuals and sometimes other big companies, but never by individuals against company, so they know they don’t have to bother.

latexr · 2025-01-19T18:29:58 1737311398

> Sometimes I just feel like these people overestimate how much they are actually owed from these training runs.

It’s not about being paid for including their work, it’s about being compensated for having done so without permission. For crying out loud, they went out of their way to remove copyright notices from the pirated work.

> It’s trained on 15T tokens. So how many did you provide that were genuinely novel?

Then they can just take it out. And go ahead and take out every thing you didn’t have permission to include. What’s that? The model is now significantly worse? Yeah, these things compound.

> And how much money do you want? Like $5 from OpenAI? And $0 from meta since it’s open source?

No, they would’ve wanted for the work to not have been included without permission in the first place. Do you understand the world you’re advocating for? You’re arguing it’s OK for rich people to do whatever they want if they throw some scraps on the floor for you. Not everything is about money. Unfortunately there’s no other reasonable way (legal and non violent) to punish these infringers.

> I personally hope we can all get on the same team with AI and treat its advancement as scientific research for the betterment of humanity.

What you’re expressing is “I hope everyone will stop arguing and agree with me”. These moguls care about themselves, it is incredibly naive to believe they give a rat’s ass about “the betterment of humanity”.

forgetfulness · 2025-01-19T18:33:22 1737311602

LLM companies aren't being funded by the hundreds of billions because investors expect science to be advanced by text and image generators.

I find it very unlikely that the commodification of knowledge work will be for the betterment humanity, I don't know if people are expecting here that just because the value of more people's labor becomes zero that we will do, what, do away with money? No, it will just mean that fewer people will have the chance to earn the right to use space and resources in a meaningful way.

inetknght · 2025-01-19T18:34:48 1737311688

> I find it very unlikely that the commodification of knowledge work will be for the betterment humanity

There's no law to force it. So of course it won't be.

Even if there were a law to force it, how would you enforce it?

Hammershaft · 2025-01-19T18:21:52 1737310912

Hard to treat llms training aon your data at your expense as research for the betterment of humanity when it is specifically the private company who is imposing that cost on you that profits.

fourside · 2025-01-19T18:24:56 1737311096

Does this goes both ways? Can I infringe of Disney’s IP on the grounds that their stories are so derivative that they aren’t actually that new?

The betterment of humanity seems to involve some parties making a ton of money while the people who provided the data apparently just need to be grateful.

jncfhnb · 2025-01-19T18:26:05 1737311165

You can absolutely use the story frameworks that Disney has done.

You cannot make a story featuring Simba.

giantrobot · 2025-01-19T18:34:01 1737311641

> You cannot make a story featuring Simba.

Just use Kimba the White Lion.[0]

[0] https://12tomatoes.com/kimba-similarity-lion-king/

jsheard · 2025-01-19T18:54:12 1737312852

The fact they they were willing to risk significant legal exposure in order to use this dataset suggests it's worth considerably more than 5 dollars to them. Zuck isn't putting his ass on the line for a Big Mac.

sensanaty · 2025-01-20T00:36:19 1737333379

So if the works they're stealing aren't worth anything, why do they need it so badly?

papercrane · 2025-01-19T18:16:38 1737310598

Assuming the works have registered with the copyright office they're eligible for statutory damages.

The range for that is huge though, it can be in the hundreds of dollars per work, or if the infringement is shown to be wilful then a judge can award up to $150,000 per work.

ripped_britches · 2025-01-19T18:22:07 1737310927

Fair point, seems willful here

lm28469 · 2025-01-19T18:23:49 1737311029

There is absolutely no logical pathway between the current flavor of hardcore free for all individualistic capitalism and what you describe here

TruffleLabs · 2025-01-19T20:38:57 1737319137

Stealing is still illegal

blibble · 2025-01-19T18:16:32 1737310592

well US copyright law statutory damages are $30,000 per work infringed, and $150,000 if done deliberately

so I think $150,000 per copyrighted work ingested is fair