Even though he’s right, Rob Bonta is going to get himself fired, while Scott Wiener will write a bill legalizing training on non expressly licensed data.
Fun fact: Germany's IP law has a provision that allows AI training by default for everything that's reachable on the Internet, if the website operator hasn't published a "nope" in machine-readable form (i.e. robots.txt).
If you read the law it says it’s only allowed if the rights holder doesn’t disallow it (in machine readable form) - I would argue robots.txt falls under machine readable
Sort of? But it is very different to the federal cabinet where they serve at the pleasure of the president. The recall process is slow, expensive, and rarely successful.
>Since 1913, there have been 181 recall attempts of state elected officials in California. Eleven recall efforts collected enough signatures to qualify for the ballot and of those, the elected official was recalled in six instances.
I'm not sure how... likely that is? Like, is the California electorate particularly enamoured of the LLM-flogging mega-corps, such that they would do their bidding in this way?
Like, if the reasoning is "we should recall them because they were mean to the lovely companies :(" then I'd expect the average person to say, broadly, "good" and vote against recall. 'AI' is not particularly popular with the public.
He co-authored Senate Bill 239, which lowered the penalty of exposing someone to HIV without their knowledge and consent from a felony to a misdemeanor
Everyone qualified to speak on such things was pretty universally in agreement that the previous law was increasing the spread of HIV rather than decreasing it, as the primary effect it had was that sex workers would refuse to get tested.
Expecting megacorporations to play by the same rules they want us to follow when in comes to their rights is pretty far from copyright maximalism. Anti-human is giving corporations more rights than humans.
> "The law, in its majestic equality, permits rich and poor alike to massively-plagiarize anything they want after investing at least $100,000,000 on a computational pipeline to statistically launder its origins and details."
-- Cyberpunk Anatole France
____
If I were to steel-man your comment, it would be something like: "Scraping and training must be fair-use because people can be building all sorts of systems with ethical and valuable purposes. What you generate from a trained system can easily infringe, but that's a separate thing."
Also, where does the GNU Public License fall in terms of "anti-human copyright maximalization"? Is it bad because it uses fire, or is it good because it fights fire with fire?
>it would be something like: "Scraping and training must be fair-use because
It wouldn't be "fair use". It makes no copies. "Fair use" is the horseshit the courts dreamt up so they could pretend copyright wasn't broken when a copy absolutely needed to be made.
This makes no copies, so it doesn't even need "fair use". Instead, there are people who believe that because they made something long ago that they and their descendants into the far future are entitled to tax everyone who might ever come across that thing let alone actually want copies of the thing.
Your argument must sound intelligent to you, but it starts from a premise of "of course copyright is the only non-lunatic policy people could ever imagine", and goes from there. You can't even think in any other terms.
> Also, where does the GNU Public License fall in terms of "anti-human copyright maximalization"? Is it bad because it uses fire, or is it good because it fights fire with fire?
Stallman is clever to twist the rules a little to get a comparatively sane result from them, but there are others who aren't clever enough to even recognize that that's what he's doing. So, in their minds "what about the gnu license" seems like a gotcha. I won't name those people, but their username starts with Terr and ends with an underscore.
Incorrect, the real-world behavior we're discussing involves unambiguous copies, where LLM companies scrape and retain the data in a huge training corpus, since they want to train a new iteration of the model when they adjust the algorithms.
That accumulation is analogous to photocopying books and magazines that you borrow/buy before returning/selling them again, and arranging your new copies into a clubhouse or company break-room. Such a thing is not usually considered "fair use."
In a hypothetical world where all content is merely streamed into a model, then the question of whether model-weights can be considered a copy with a special form of lossy compression is... separate, and much trickier.
> Your argument [...] starts from a premise of "of course copyright is the only non-lunatic policy people could ever imagine"
Nope, it's just the context of the discussion because it's status-quo we're living with and the one we're faced with incrementally changing. If you're going to rage-post about it, at least stop and direct that rage appropriately.
> Stallman is clever to twist the rules a little to get a comparatively sane result from them, but [you don't] recognize that that's what he's doing.
I already described the GPL as "fighting fire with fire", I don't understand how the idiom didn't make sense to you.
Surprisingly when it comes to software, music, and movie stealing we find that stealing requires one party to lose something but when it comes to OpenAI stealing is happily colloquially defined. What an interesting curiosity.
When I steal a thing from you, you no longer have the thing.
When I steal a dance you just invented, you're very butthurt about it and run crying to mommy "make him stop copying me!". Then you grow up and bribe Congress to make it illegal. Except for the "growing up" part, that never happened.