> One can use Photoshop to create heinous things which would be highly illegal to sell, or even show. Should we ban Photoshop? I think we agree, that no, that would be silly.
This is a different matter, just like saying "we don't make cars illegal even though cars can be used illegally". IMO it is off topic in the context of my post above (or you need to elaborate).
> Let's assume I train a model on public domain texts only, zero copyrighted material has gone into it.
Do you genuinely not see the difference, from the point of view of the copyright holders? If OpenAI could build a version of ChatGPT without using any copyrighted material at all, I agree with you: it should not be made illegal. But the illegal use of it should be. I guess we agree here, but to me that is quite different from training ChatGPT with copyrighted material. The first difference being... well... good luck training ChatGPT without copyrighted material (probably now it's too late anyway, because weights of models trained from copyrighted material are just all over the Internet).
> Also, let's keep in mind that these models are not archives, that contain the original data verbatim. They are effectively lossy compression algorithms, that capture the essence.
Say I creat excutable dat do som kind of loss cmpression tht captred essnce.
First, do you agree that the sentence above is "some kind of lossy compression that captures the essence"? If yes, would you consider it legal for me to use that algorithm on famous books and sell them under my name, or would you think that I abused the copyright of the original material?
This is a different matter, just like saying "we don't make cars illegal even though cars can be used illegally". IMO it is off topic in the context of my post above (or you need to elaborate).
> Let's assume I train a model on public domain texts only, zero copyrighted material has gone into it.
Do you genuinely not see the difference, from the point of view of the copyright holders? If OpenAI could build a version of ChatGPT without using any copyrighted material at all, I agree with you: it should not be made illegal. But the illegal use of it should be. I guess we agree here, but to me that is quite different from training ChatGPT with copyrighted material. The first difference being... well... good luck training ChatGPT without copyrighted material (probably now it's too late anyway, because weights of models trained from copyrighted material are just all over the Internet).
> Also, let's keep in mind that these models are not archives, that contain the original data verbatim. They are effectively lossy compression algorithms, that capture the essence.
Say I creat excutable dat do som kind of loss cmpression tht captred essnce.
First, do you agree that the sentence above is "some kind of lossy compression that captures the essence"? If yes, would you consider it legal for me to use that algorithm on famous books and sell them under my name, or would you think that I abused the copyright of the original material?