Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Even though he’s right, Rob Bonta is going to get himself fired, while Scott Wiener will write a bill legalizing training on non expressly licensed data.


Fun fact: Germany's IP law has a provision that allows AI training by default for everything that's reachable on the Internet, if the website operator hasn't published a "nope" in machine-readable form (i.e. robots.txt).

[1] https://www.gesetze-im-internet.de/urhg/__44b.html


If you read the law it says it’s only allowed if the rights holder doesn’t disallow it (in machine readable form) - I would argue robots.txt falls under machine readable


So to legally copy a website, all you need is to just pass it through an AI filter and then you can legally publish the rip off?


No - data mining copyrighted material and republishing copyrighted material under your own name are two very different things


Passing data through an AI filter for "training" is a different thing (legally and ethically) from publishing the output.


>Rob Bonta is going to get himself fired

The Attorney General of California is an elected position. He could be recalled but not fired by the Governor.


A recall is firing.


Sort of? But it is very different to the federal cabinet where they serve at the pleasure of the president. The recall process is slow, expensive, and rarely successful.

>Since 1913, there have been 181 recall attempts of state elected officials in California. Eleven recall efforts collected enough signatures to qualify for the ballot and of those, the elected official was recalled in six instances.

https://www.sos.ca.gov/elections/recalls/recall-history-cali...


But a firing by the electorate, not a firing by the governor of CA.


Sure, but it's still getting fired. Also, the governor can support the recall (And Gavin should).


I'm not sure how... likely that is? Like, is the California electorate particularly enamoured of the LLM-flogging mega-corps, such that they would do their bidding in this way?

Like, if the reasoning is "we should recall them because they were mean to the lovely companies :(" then I'd expect the average person to say, broadly, "good" and vote against recall. 'AI' is not particularly popular with the public.


He co-authored Senate Bill 239, which lowered the penalty of exposing someone to HIV without their knowledge and consent from a felony to a misdemeanor

That Scott Wiener? How does he still have a job?


Everyone qualified to speak on such things was pretty universally in agreement that the previous law was increasing the spread of HIV rather than decreasing it, as the primary effect it had was that sex workers would refuse to get tested.


>will write a bill legalizing training on non expressly licensed data.

Which should be assumed to be legal already, even without the expressly written bill. Copyright maximalism is anti-human.


Expecting megacorporations to play by the same rules they want us to follow when in comes to their rights is pretty far from copyright maximalism. Anti-human is giving corporations more rights than humans.


> "The law, in its majestic equality, permits rich and poor alike to massively-plagiarize anything they want after investing at least $100,000,000 on a computational pipeline to statistically launder its origins and details."

-- Cyberpunk Anatole France

____

If I were to steel-man your comment, it would be something like: "Scraping and training must be fair-use because people can be building all sorts of systems with ethical and valuable purposes. What you generate from a trained system can easily infringe, but that's a separate thing."

Also, where does the GNU Public License fall in terms of "anti-human copyright maximalization"? Is it bad because it uses fire, or is it good because it fights fire with fire?


>it would be something like: "Scraping and training must be fair-use because

It wouldn't be "fair use". It makes no copies. "Fair use" is the horseshit the courts dreamt up so they could pretend copyright wasn't broken when a copy absolutely needed to be made.

This makes no copies, so it doesn't even need "fair use". Instead, there are people who believe that because they made something long ago that they and their descendants into the far future are entitled to tax everyone who might ever come across that thing let alone actually want copies of the thing.

Your argument must sound intelligent to you, but it starts from a premise of "of course copyright is the only non-lunatic policy people could ever imagine", and goes from there. You can't even think in any other terms.

> Also, where does the GNU Public License fall in terms of "anti-human copyright maximalization"? Is it bad because it uses fire, or is it good because it fights fire with fire?

Stallman is clever to twist the rules a little to get a comparatively sane result from them, but there are others who aren't clever enough to even recognize that that's what he's doing. So, in their minds "what about the gnu license" seems like a gotcha. I won't name those people, but their username starts with Terr and ends with an underscore.


> Your argument must sound intelligent to you, but [...] You can't even think in any other terms.

> others who aren't clever enough [...] I won't name those people, but their username starts with Terr and ends with an underscore.

https://news.ycombinator.com/newsguidelines.html

____________

> It wouldn't be "fair use". It makes no copies.

Incorrect, the real-world behavior we're discussing involves unambiguous copies, where LLM companies scrape and retain the data in a huge training corpus, since they want to train a new iteration of the model when they adjust the algorithms.

That accumulation is analogous to photocopying books and magazines that you borrow/buy before returning/selling them again, and arranging your new copies into a clubhouse or company break-room. Such a thing is not usually considered "fair use."

In a hypothetical world where all content is merely streamed into a model, then the question of whether model-weights can be considered a copy with a special form of lossy compression is... separate, and much trickier.

> Your argument [...] starts from a premise of "of course copyright is the only non-lunatic policy people could ever imagine"

Nope, it's just the context of the discussion because it's status-quo we're living with and the one we're faced with incrementally changing. If you're going to rage-post about it, at least stop and direct that rage appropriately.

> Stallman is clever to twist the rules a little to get a comparatively sane result from them, but [you don't] recognize that that's what he's doing.

I already described the GPL as "fighting fire with fire", I don't understand how the idiom didn't make sense to you.


This is the way. We need to restrict IP protection, especially temporally.


Stealing is more anti human.


Surprisingly when it comes to software, music, and movie stealing we find that stealing requires one party to lose something but when it comes to OpenAI stealing is happily colloquially defined. What an interesting curiosity.


When I steal a thing from you, you no longer have the thing.

When I steal a dance you just invented, you're very butthurt about it and run crying to mommy "make him stop copying me!". Then you grow up and bribe Congress to make it illegal. Except for the "growing up" part, that never happened.


I don't think the memo mentions training data sources; it's about usage and impact.


He’s more likely to get rounded up with Sheng Thao, Andre Jones, Bryan Azevedo, and his wife, Mia Bonta.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: