Hacker Newsnew | past | comments | ask | show | jobs | submit | nancyminusone's commentslogin

"such profanity" is 3 "god damns" to you?

Computers read my code, so I don't mind upsetting their feelings.

But why would anyone use AI to write documents or articles? Do you really respect your recipients so little that you can't be bothered to share your own thoughts?

I might as well get an AI to call my own mother on mother's day.


I think the specific case of having a long conversation with an agent about what you're trying to achieve and why, and then have it update a README or a skill based on that conversation is a useful thing to do. Captured the context of the conversation without having to essentially write the same thing again.

>screenshot with Minon profile pic

My god, they really did recreate Facebook


>How can we possibly hope to separate the wheat from the chaff?

Categorize, curate, and share. The war is only for your attention. I have favorite creators now, and they would cease to be favorite if they suddenly started sloppin' it up. The best of them recommended cool things made by other people, who in turn recommended more things, and so on.

If instead you peddle bullshit, it won't take long to be identified as a bullshit vendor, even if you have 1000x the bullshit of the next leading brand.

Not everyone will get the message especially if you mainly consume algorithmic feeds - we all seem to have that relative who thinks you would enjoy being sent an AI Jesus image every other week.


No one is asking human savants about what they read 1 million times per day.

Suppose they did, and some guy was filling stadiums regularly to hear him recite an entire audio book. That would probably get the attention of someone's lawyers.


I don't see your point. The problem is producing the copyrighted work, not processing it beforehand.

If it's illegal for AIs it should be illegal for humans, too. Is that really what you're arguing? It should be illegal for savants to read books?


I don't think anyone is arguing that the consumption is illegal. It's the reproduction that is illegal.

Read a book, that's fine. Write a book, that's fine. Read a book and then write a book that is 99.9% the same as the book that you read and sell it for profit without a license from the original author, that's infringement.


No, if you read the article, the point is in the training, not the reproduction.

That's what all these lawsuits are about - it's the training not the reproduction. I already agreed in my first comment that the reproduction is off limits.

In this case, it appears that Meta torrented illegal copies of the work to do the training. Obviously that's bad. But conflating that with training itself doesn't follow.


The point of these lawsuits is the piracy. My parent comment was about the general situation, not this specific article.

Pirating content is illegal, regardless of if it is to train an LLM.

Usage of LLMs trained on unlicensed content (basically all of them) might or might not be illegal.

Using any method to reproduce a copyrighted work by using that original as input in a way that supplants the market value of the original is probably illegal.

At least that is my rudimentary understanding.


Well - maybe so. But the common belief is that training itself is a violation of copyright, no matter how it's done. That's the argument I'm countering here.

The issue is that the trainers have not sought licenses for the data and instead outright pirated it.

I don't think anyone thinks that all training is a copyright violation if all the training data is licensed. For example a LLM trained on CC0 content would be fine with basically everyone.

The problem is that training happens on data that is not licensed for that use. Some of that data also is pirated which makes it even clearer that it is illegal.


But why should separate licensing be required at all? A search engine reads and indexes every word of every page it crawls. No one argues that requires licensing, only that the outputs must respect copyright. Why should training be different?

When google starting outputting summaries people asked the same questions.

If you supplant the value of the original with the original as input then you probably have some legal questions to answer.


But that's about the output, not the training. We agree: outputs that supplant the original are the problem. A model constrained to produce only fair use outputs causes no such harm — regardless of what it was trained on.

Sharing copyrighted material is illegal. Presumably, if Meta blocked all seeding on the torrents they downloaded, they wouldn't have broken copyright, right?

If copyright law doesn't extend to the works being used for training, why should it extend to the model that is produced as a result? AI model creators have set up an ethical scenario where the right thing to do is ignore copyright laws when it comes to AI, which includes model use. It might never be legal, but it has become ethical to pirate models, distill them against ToS, etc.

I'm not sure I follow. Can you say it a different way?

I think the parent is basically saying that if you can legally pirate a book to train a LLM why can't you legally pirate a LLM model?

It's a "rules for thee and not for me" argument.


AH. Thank you.

Training requires making copies. Even if Meta had purchased each work they'd have had to make copies of it to distribute around the training cluster.

Does it though? If they bought a copy for each machine?

Then no copying happened so they'd be on firmer legal ground.

Good, we're agreed. My only point here is that training is not inherently a copyright violation.

>The problem is producing the copyrighted work, not processing it beforehand.

the distinction isn't particularly clear cut with an open source model. If it is able to reproduce copyright protected work with high fidelity such that the works produced would be derivative, that's like trying to get around laws against distribution of protected works by handing them to you in a zip file.

It's a kind of copyright washing to hand you the data as a binary blob and an algorithm to extract them out of it. That wouldn't really fly with any other technology.

And that's really where a lot of the value is mind you, these models are best thought of as lossily compressed versions of their input data. Otherwise Facebook ought to be perfectly fine to train them on public domain data.


I tend to agree - but you assume that it would not be possible to create a model that can train on copyrighted work and only output text which would be considered fair use.

That seems very possible to me, and undermines the "training is copyright violation" argument. It's not the training, it's the output.


I think the algorithms have categorized so correctly, my green-named friend.

I believe it is still a meme stock in the dark, "lose money quickly" corners of the internet.

Well, as someone who has tried to build at least a couple small robot arms, I think we are probably closer to 20-50 years away. Both the power and dexterity are not there.

Right now, only a human can both push over a boulder and pick up a tiny speck from the floor using the same actuator.


>Shouldn't have to trade privacy for safety.

You shouldn't have to, and yet...

https://www.ftc.gov/news-events/news/press-releases/2026/01/...


Lane keeping assistance is optional on any vehicle. I don't believe there is any current production in which you can't opt out of lane keeping assistance?


Isn't it mandatory in the EU if the car supports it? Mandatory as in it's opt-out and will re-enable itself every time you turn on the car.


> will re-enable itself every time you turn on the car

I think that's only for the speed limit alarms. Wouldn't have that if people would stick to limits, I guess...


Not that I’ve seen. Every time I rent a recent model year, they have the lane keeping assist feature but it only works when you enable adaptive cruise control.

But maybe that’s what you meant?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: