I can't say how strongly I disagree with the ethical/copyright concerns raised here.
The idea that intelligences - whether they be human, artificial or alien - should be forbidden from learning from code freely shared on the internet goes against everything I like about open source.
I think it's fair that no one should be able to use reproduced copyright code verbatim, whether that by by a human memorizing something or a computer copying it.
But I take the complete opposite view on the ethics of letting machine learn from work. I think this should be encouraged.
> The idea that intelligences [...] should be forbidden from learning from code freely shared on the internet goes against everything I like about open source.
Love it or lump it, this has been the battleground for most free software distribution. Just having a licensed copy of code does not permit you to do anything with it, even the BSD license has restrictive terms for the user. I'm an enormous Copyleft advocate, but arguably Open Source is only enforced by stopping people from using it illegally. If ChatGPT's fate is to turn into an GPL-licensed code-launderer, then projects have a great basis for banning it among their contributors.
> I think it's fair that no one should be able to use reproduced copyright code verbatim
Then I don't see how you can be upset at Open Source projects for adopting basic standards. They also want to protect their own license and community, with the main difference being that they're not in it for the money. Again - there was never any point where "Open Source" was synonymous with "do whatever you want with the code unconditionally".
Agreed. Although it is sometimes difficult to imagine any sensible application of copyright of code in the first place apart from code that for some reason or another contains data.
There exist copyrighted algorithms for certain applications, especially for AI application these days, but a reengineering should be possible and similarities should be handled as liberal as possible.
Therein lies the rub. Most of the discourse and debate around LLMs and copyright revolve around the central question of what it means to learn.
Virtually everyone agrees that a human learning from reading code doesn't violate copyright (by somehow copying the knowledge into one's brain), because the human brain is some kind of copyright laundering machine, maybe? I don't really know whether there is any argument for that apart that can't be reduced to an appeal to common sense.
On the other hand, a DL algorithm learning from processing tokens of code scraped from public sources such as GitHub doesn't present the same kind of obviousness. My personal belief is that it's also learning and shouldn't be forbidden, but I can't deny the negative consequences of that. We're already seeing a lot of bad things come out of the democratization of GPTs.
> Virtually everyone agrees that a human learning from reading code doesn't violate copyright
I don't think this is so universally agreed upon as you think, or at least the implications of learning from copyrighted material. This is why projects like ReactOS and WINE strongly prohibit contributors from reading leaked Windows source code, incase they learn a little bit too much and accidentally reproduce copyrighted material.
> because the human brain is some kind of copyright laundering machine, maybe
Absolutely not. Music is full of legal cases where someone learned and copy a bit too much and went to court over it.
> Virtually everyone agrees that a human learning from reading code doesn't violate copyright (by somehow copying the knowledge into one's brain), because the human brain is some kind of copyright laundering machine, maybe? I don't really know whether there is any argument for that apart that can't be reduced to an appeal to common sense.
However, note the case law example of the NEC V20 which found in NEC's favor:
> While NEC themselves did not follow a strict clean room approach in the development of their clone's microcode, during the trial, they hired an independent contractor who was only given access to specifications but ended up writing code that had certain similarities to both NEC's and Intel's code. From this evidence, the judge concluded that similarity in certain routines was a matter of functional constraints resulting from the compatibility requirements, and thus were likely free of a creative element.
> because the human brain is some kind of copyright laundering machine
I once came up with the idea of a physical copyright laundering machine. It had three CD-R drives (this shows how long ago I had the idea). You’d insert a CD-ROM to launder, and two blank CD-R discs. To one of the CD-Rs, it would write a one-time pad, to the other it would write the input CD-ROM XOR the one-time pad. A hardware RNG (I wanted to use a quantum process such as radioactive decay for more emphatic indeterminism) generates the one-time pad. It also generates a single random bit which determines which output CD-R gets the key and which one gets the ciphertext. That bit is never revealed to the user (or recorded in any way). The end result is two CD-Rs, one containing random data, the other a copyrighted work encrypted with random data-but it is impossible to know which is which.
I never actually built one of these machines. I wanted to patent it, but gave up when I realised how much patents cost. I also eventually realised that my machine would never work, because it was approaching the law with the mind of a developer not the mind of a judge - I doubt any judge would actually be convinced by my copyright laundering machine, they’d find a way to rule against it, whatever exact way that might be. The law and computing are both systems of rules, but the rules in the former involve far more discretion and flexible interpretation.
> the human brain is some kind of copyright laundering machine, maybe?
I don't really know whether there is any argument for that apart
that can't be reduced to an appeal to common sense.
Because it's an end not a means. The concept is central in philosophy,
law, ethics, education.
It's actually somewhat poignant to me that this not an obvious point for many circling around this. Accelerationism (in its true pseudo-Deleuzian/Nick Landian form, not the subreddit) is coming for us one way or another.
Maybe all we can hope for is consolation. Like, if the tech elite is successful in this campaign of soft-dehumanization ("we are all LLMs anyway") it will open up some new cultural pathways to greater respect for animals and the environment.
What is a cow but some kind of copyright laundering machine anyway?
Subjectivity as we know it is quite contemporary, articulated in part by things like Kant's kingdom. I think things will start to change again. The world of love, art, particularly human passions might fade away to something different. You can almost feel people hungry for this. It doesn't have to be good or bad, its not for our subjectivity to understand after all. But I will say it does feel like we are leaving summertime now, towards a colder future.
I think it's inevitably bad because dehumanisation ends in
war. Dehumanisation, for me, is synonymous with violence. It isn't
just technology but the terrifying post-modern subjectivity that
allows 8 billion people to co-exist in relative peace.
"Accelerationism" looks set to build still more and better weapons,
fewer ways of resisting using them, and less capacity to care, so I
fear the colder future you speak of will be a nuclear winter. Of
course, as you say, it's like some people want that.
The idea that intelligences - whether they be human, artificial or alien - should be forbidden from learning from code freely shared on the internet goes against everything I like about open source.
I think it's fair that no one should be able to use reproduced copyright code verbatim, whether that by by a human memorizing something or a computer copying it.
But I take the complete opposite view on the ethics of letting machine learn from work. I think this should be encouraged.