Pirate and pay the fine is probably hell of a lot cheaper than individually buyi...

pyman · 2025-07-07T10:59:43 1751885983

The problem with this thinking is that hundreds of thousands of teachers who spent years writing great, useful books and sharing knowledge and wisdom probably won't sue a billion dollar company for stealing their work. What they'll likely do is stop writing altogether.

I'm against Anthropic stealing teacher's work and discouraging them from ever writing again. Some teachers are already saying this (though probably not in California).

js8 · 2025-07-07T17:55:59 1751910959

> The problem with this thinking is that hundreds of thousands of teachers who spent years writing great, useful books and sharing knowledge and wisdom probably won't sue a billion dollar company for stealing their work. What they'll likely do is stop writing altogether.

I think this is a fantasy. My father cowrote a Springer book about physics. For the effort, he got like $400 and 6 author copies.

Now, you might say he got a bad deal (or the book was bad), but I don't think hundreds of thousands of authors do significantly better. The reality is, people overwhelmingly write because they want to, not because of money.

pyman · 2025-07-08T07:11:02 1751958662

I see where you are coming from: "My 8-yo son can also build websites".

Writing books is a profession.

Some people write full-time and make a living from it, through book sales, speaking gigs, teaching, or other related work.

Maybe ask Tim O’Reilly what he thinks about this so-called fantasy.

Like I said, Anthropic needs to stop stealing books or face the consequences.

js8 · 2025-07-08T11:34:46 1751974486

No you don't see where I am coming from. And my father was a university professor. I am certainly not opposed to authors being fairly remunerated for their work, that's why I brought up that example.

My point is, the controversy is not an AI corporation vs 10^5 ordinary teachers. It's a battle of two corporations, or business models, if you will. But regardless of the result, most of the book authors will continue to get screwed, maybe the means will change. But it will not prevent them from writing, either. So I don't see any mass writers protests coming, sorry.

I also don't think Anthropic AI is going to be any less intelligent if it didn't read any modern fiction book, instead of reading a Wikipedia summary. Stories and myths are a human way of understanding the world, machines probably don't need them. And for non-fiction books - there really isn't that many irreplaceable high-profile authors out there. If it can't read, say, Feynman's Lectures on Physics, it can learn the same from 100s of other physics textbooks. Maybe they are slightly worse organized but why should superintelligence care?

greenie_beans · 2025-07-09T21:34:40 1752096880

you are correct

NoMoreNicksLeft · 2025-07-07T16:28:54 1751905734

Stealing? In what way?

Training a generative model on a book is the mechanical equivalent of having a human read the book and learn from it. Is it stealing if a person reads the book and learns from it?

janalsncm · 2025-07-08T01:01:14 1751936474

> In what way?

Downloading the book without paying for it, which is more or less what the judge said.

blocko · 2025-07-07T18:22:18 1751912538

Depends on how closely that person can reproduce the original work without license or attribution

lcnPylGDnU4H9OF · 2025-07-07T18:48:50 1751914130

It actually depends on whether or not they reproduce it and especially what they do with the copy after making it.

blocko · 2025-07-08T18:35:24 1751999724

Sure. I'd say reproducing and distributing it to someone who happens to ask the right questions would qualify

lcnPylGDnU4H9OF · 2025-07-08T20:16:56 1752005816

Well, right, but that's different from "can reproduce the original work". I "can" start typing out song lyrics but it doesn't mean that I stole the songs I've listened to.

coffeefirst · 2025-07-08T01:33:18 1751938398

But a language model is not a person, it’s a copy machine with a blender inside.

Photocopying books in their entirety for commercial use is absolutely illegal.

glimshe · 2025-07-07T11:44:24 1751888664

That will be sad, although there will still be plenty of great people who will write books anyway.

When it comes to a lot of these teachers, I'll say, copyright work hand in hand with college and school course book mandates. I've seen plenty of teachers making crazy money off students' backs due to these mandates.

A lot of the content taught in undergrad and school hasn't changed in decades or even centuries. I think we have all the books we'll ever need in certain subjects already, but copyright keeps enriching people who write new versions of these.

CuriouslyC · 2025-07-07T11:36:48 1751888208

If you care so little about writing that AI puts you off it, TBH you're probably not a great writer anyhow.

Writers that have an authentic human voice and help people think about things in a new way will be fine for a while yet.

4b11b4 · 2025-07-07T13:21:15 1751894475

Yeah, people will still want to write. They might need new ways to monetize it... that being said, even if people still want to write they may not consider it a viable path. Again, have to consider other monetization.

lofaszvanitt · 2025-07-07T11:26:58 1751887618

They won't be needed anymore, once singularity is reached. This might be their thought process. This also exemplifies that the loathed caste system found in India is indeed in place in western societies.

There is no equality, and seemingly there are worker bees who can be exploited, and there are privileged ones, and of course there are the queens.

SketchySeaBeast · 2025-07-07T14:23:08 1751898188

> They won't be needed anymore, once singularity is reached.

And it just so happens that that belief says they can burn whatever they want down because something in the future might happen that absolves them of those crimes.

pyman · 2025-07-07T11:32:52 1751887972

:D

Note: My definition of singularity isn't the one they use in San Francisco. It's the moment founders who stole the life's work of thousands of teachers finally go to prison, and their datacentres get seized.

lofaszvanitt · 2025-07-07T11:48:10 1751888890

You can bet that this never gonna happen...

covercash · 2025-07-07T14:41:13 1751899273

When the rich and powerful face zero consequences for breaking laws and ignoring the social contracts that keep our society functioning, you wind up with extreme overcorrections. See Luigi.

achierius · 2025-07-07T15:38:14 1751902694

How extreme is that, really? Not to justify murder: that is clearly bad. But "killing one man" is evidently something we, as a society, consider an "acceptable side-effect" when a corporation does it -- hell, you can kill thousands and get away scot-free if you're big enough.

Luigi was peanuts in comparison.

“THERE were two “Reigns of Terror,” if we would but remember it and consider it; the one wrought murder in hot passion, the other in heartless cold blood; the one lasted mere months, the other had lasted a thousand years; the one inflicted death upon ten thousand persons, the other upon a hundred millions; but our shudders are all for the “horrors” of the minor Terror, the momentary Terror, so to speak; whereas, what is the horror of swift death by the axe, compared with lifelong death from hunger, cold, insult, cruelty, and heart-break? What is swift death by lightning compared with death by slow fire at the stake? A city cemetery could contain the coffins filled by that brief Terror which we have all been so diligently taught to shiver at and mourn over; but all France could hardly contain the coffins filled by that older and real Terror—that unspeakably bitter and awful Terror which none of us has been taught to see in its vastness or pity as it deserves.”

- Mark Twain

TimorousBestie · 2025-07-07T10:55:07 1751885707

150K per work is the maximum fine for willful infringement (which this is).

105B+ is more than Anthropic is worth on paper.

Of course they’re not going to be charged to the fullest extent of the law, they’re not a teenager running Napster in the early 2000s.

dragonwriter · 2025-07-07T18:24:15 1751912655

> 150K per work is the maximum fine for willful infringement

No, its not.

It's the maximum statutory damages for willful infringement, which this has not be adjudicated to be. it is not a fine, its an alternative to basis of recovery to actual damages + infringers profits attributable to the infringement.

Of course, there's also a very wide range of statutory damages, the minimum (if it is not "innocent" infringement) is $750/work.

> 105B+ is more than Anthropic is worth on paper.

The actual amount of 7 million works times $150,000/work is $1.05 trillion, not $105 billion.

TimorousBestie · 2025-07-07T18:41:51 1751913711

> It's the maximum statutory damages for willful infringement, which this has not be adjudicated to be. it is not a fine, its an alternative to basis of recovery to actual damages + infringers profits attributable to the infringement.

Yeah, you’re probably right, I’m not a lawyer. The point is that it doesn’t matter what number the law says they should pay, Anthropic can afford real lawyers and will therefore only pay a pittance, if anything.

I’m old enough to remember what the feds did to Aaron Schwarz, and I don’t see what Anthropic did that was so different, ethically speaking.

voxic11 · 2025-07-07T13:55:11 1751896511

Even if they don't qualify for willful infringement damages (lets say they have a good faith belief their infringement was covered by fair use) the standard statutory damages for copyright infringement are $750-$30,000 per work.

eikenberry · 2025-07-07T18:20:37 1751912437

Plus they did it with a profit motive which would entail criminal proceedings.

glimshe · 2025-07-07T11:03:08 1751886188

Isn't "pirating" a felony with jail time, though? That's what I remember from the FBI warning I had to see at the beginning of every DVD I bought (but not "pirated" ones).

voxic11 · 2025-07-07T13:57:51 1751896671

Yes criminal copyright infringement (willful copyright infringement done for commercial gain or at a large scale) is a felony.

pyman · 2025-07-07T11:16:26 1751886986

[flagged]

dmix · 2025-07-07T14:07:35 1751897255

A court just ruled on Anthropic and said an LLM response wasn't a form of counterfeiting (ie, essentially selling pirate books on the black market). Although tbf that is the most radical interpretation still being put forward by the lawyers of publishers like NYTimes, despite the obvious flaws.

pyman · 2025-07-08T07:07:11 1751958431

Anthropic still faces billions of dollars in damages for pirating over 7 million books to build a digital library.

The trial is scheduled for December 2025. That's when a jury will decide how much Anthropic owes for copying and storing those pirated books

Kim_Bruning · 2025-07-07T17:33:57 1751909637

What someone at Anthropic did was download libgen once, then Anthropic figured "wait a minute, isn't that illegal?" , so instead they went and bought 7 million books for real and cut them up to scan them.

Turns out this doesn't quite mitigate downloading them first. (Though frankly, I'm very much against people having to buy 7 million books when someone has already scanned them)

mystified5016 · 2025-07-07T13:39:23 1751895563

No, it isn't.

suyjuris · 2025-07-07T11:46:53 1751888813

Just downloading them is of course cheaper, but it is worth pointing out that, as the article states, they did also buy legitimate copies of millions of books. (This includes all the books involved in the lawsuit.) Based on the judgement itself, Anthropic appears to train only on the books legitimately acquired. Used books are quite cheap, after all, and can be bought in bulk.

asadotzler · 2025-07-07T15:29:11 1751902151

Buying a book is not license to re-sell that content for your own profit. I can't buy a copy of your book, make a million Xeroxes of it and sell those. The license you get when you buy a book is for a single use, not a license to do what ever you want with the contents of that book.

suyjuris · 2025-07-07T16:44:24 1751906664

Yes, of course! In this case, the judge identified three separate instances of copying: (1) downloading books without authorisation to add to their internal library, (2) scanning legitimately purchased books to add to their internal library, and (3) taking data from their internal library for the purposes of training LLMs. The purchasing part is only relevant for (2) — there the judge ruled that this is fair use. This makes a lot of sense to me, since no additional copies were created (they destroyed the physical books after scanning), so this is just a single use, as you say. The judge also ruled that (3) is fair use, but for a different reason. (They declined to decide whether (1) is fair use at this point, deferring to a later trial.)

thedevilslawyer · 2025-07-07T16:30:11 1751905811

What are you on about - the judge has literally said this was not resell, and is transformative and fair use.

maeln · 2025-07-07T11:41:55 1751888515

If you wanted to be legit with 0 chance of going to court, you would contact publisher and ask to pay a license to get access to their catalog for training, and negotiate from that point.

This is what every company using media are doing (think Spotify, Netflix, but also journal, ad agency, ...). I don't know why people in HN are giving a pass to AI company for this kind of behavior.

CaptainFever · 2025-07-07T18:53:57 1751914437

> I don't know why people in HN are giving a pass to AI company for this kind of behavior.

As mentioned in The Fucking Article, there's a legal difference between training an AI which largely doesn't repeat things verbatim (ala Anthropic) and redistributing media as a whole (ala Spotify, Netflix, journal, ad agency).

pyman · 2025-07-07T12:06:04 1751889964

[flagged]

edgineer · 2025-07-07T14:38:04 1751899084

The paradigm is that teachers will teach life skills like public speaking and entrepreneurship. Book smarts that can be more effectively taught by AI will be, once schools catch up.

pyman · 2025-07-08T06:58:53 1751957933

I agree. The world is changing fast, but we need to make the transition less painful for everyone. The way things are going now only benefits big tech.

ohashi · 2025-07-07T14:38:20 1751899100

Because they are mostly software developers who think it's different because it impacts them.

darkoob12 · 2025-07-07T12:48:15 1751892495

This is not about paying for a single copy. It would still be wrong even if they have bought every single one of those books. It is a form of plagiarism. The model will use someone else's idea without proper attribution.

jeroenhd · 2025-07-07T14:49:32 1751899772

Legally speaking, we don't know that yet. Early signs are pointing at judges allowing this kind of crap because it's almost impossible for most authors to point out what part of the generated slop was originally theirs.

tmaly · 2025-07-07T13:57:19 1751896639

At minimum they should have to buy the book they are deriving weights from.

SirMaster · 2025-07-07T16:28:29 1751905709

But should the purchase be like a personal license? Or like a commercia license that costs way more?

Because for example if you buy a movie on disc, that's a personal license and you can watch it yourself at home. But you can't like play it at a large public venue that sell tickets to watch it. You need a different and more expensive license to make money off the usage of the content in a larger capacity like that.

tmaly · 2025-07-10T17:57:44 1752170264

I think it would have to be commercial since they are making profits from selling inference on the weights that derived from the books.

kg · 2025-07-07T11:21:04 1751887264

Google did it the legal way with Google Books, didn't they?

pyman · 2025-07-07T11:38:56 1751888336

[flagged]

suyjuris · 2025-07-07T12:00:53 1751889653

The judge appears to disagree with you on this. They found that training and selling an LLM are fair use, based on the fact that it is exceedingly transformative, and that the copyright holders are not entitled to any profits thereof due to copyright. (They also did get paid — Anthropic acquired millions of books legally, including all of the authors in this complaint. This would not retroactively absolve them of legal fault for past infringements, of course.)

pyman · 2025-07-07T12:11:54 1751890314

The trial is scheduled for December 2025. That's when a jury will decide how much Anthropic owes for copying and storing over seven million pirated books

suyjuris · 2025-07-07T13:05:03 1751893503

Yes, that would by an interesting trial. But it is only about six books, and all claims regarding Claude have been dismissed already. So only the internal copies remain, and there the theory for them being infringing is somewhat convoluted: you have to argue that they are not just for purposes of training (which was ruled fair use), and award damages even though these other purposes never materialised (since by now, they have legal copies of those books). I can see it, but I would not count on there being a trial.

pyman · 2025-07-08T07:04:00 1751958240

A couple of books?

Anthropic faces billions of dollars in damages for pirating over 7 million books to build a digital library.

The trial is scheduled for December 2025. That’s when a jury will decide how much Anthropic owes for copying and storing over 7 million pirated books

flaptrap · 2025-07-07T14:04:19 1751897059

The fallacy in the 'fair use' logic is that a person acquires a book and learns from it, but a machine incorporates the text. Copyright does not allow one to create a derivative work without permission. Only when the result of the transformation resembles the original work could it be said that it is subject to copyright. Do not regard either of those legal issues are set in concrete yet.

mensetmanusman · 2025-07-07T14:25:09 1751898309

Both a human and a machine learn from it. You can design an LLM that doesn’t spit back the entire text after annealing. It just learns the essence like a human.

badmintonbaseba · 2025-07-07T14:36:12 1751898972

Morally maybe, but AFAIK machines "learning" and creating creative works on their own is not recognized legally, at least certainly not the same way as for people.

Workaccount2 · 2025-07-07T14:58:03 1751900283

>AFAIK machines "learning" and creating creative works on their own is not recognized legally

Did you read the article? The judge literally just legally recognized it.

blibble · 2025-07-07T16:22:06 1751905326

> Pirate and pay the fine is probably hell of a lot cheaper than individually buying all these books.

$500,000 per infringement...

jandrese · 2025-07-07T17:39:13 1751909953

And the crazy thing is that might be cheaper when you consider the alternative is to have your lawyers negotiate with the lawyers for the publishing companies for the right to use the works as training data. Not only is it many many billable hours just to draw up the contract, but you can be sure that many companies would either not play ball or set extremely high rates. Finally, if the publishing companies did bring a suit against Anthropic they might be asked to prove each case of infringement, basically to show that a specific work was used in training, which might be difficult since you can't reverse a model to get the inputs. When you're a billion dollar company it's much easier to get the courts to take your side. This isn't like the music companies suing teenagers who had a Kazaa account.

bmitc · 2025-07-07T15:49:50 1751903390

> I'm not saying this is justified, but what would you have done in their situation?

Individuals would have their lives ruined either from massive fines or jail time.