Fundamentally you are making an argument of scale. Clearly, people should be able to police their own local area networks, and lease dedicated network lines.
Today’s large models are derived from large amounts of public data that the people that trained the models did not properly license.
They’re certainly prohibited by existing copyright law (there is at least one instance of copyright infringement in the ChatGPT training set, and there is no practical way to remove the infringing source data).
However, the courts have chosen not to enforce that part of the law.
So, one could easily argue that any model trained at that scale, by definition, only exists via a special grant (analogous to an easement, but non-exclusive) and is therefore in the public domain, and available for unrestricted use by the public.
In fact, there is case law around “sweat of the brow” works, like phone books, which are already treated with weaker copyright protection than other works. In particular, aggregating a pile of facts does not give you a copyright on the facts.
I don’t think scale (alone) is sufficient. All products and services in some way exist along a scale from niche product (which benefits from public infrastructure like roads that let it be delivered places) up to utilities (which can only exist by providing something approaching monopolistic grants and easements)
But I have been significantly persuaded by your point about the vast body of cultural work being its own sort of (more abstract) landscape of… “socio-human natural resource” … maybe is a not awful way to put it.
In which case I still don’t thing the same comparison to ISP’s applies, not quite, but I do think we need a new category and body of social norms and laws to deal with this.
Or at least that’s my first-pass take after reading your comment, which I am, again, very persuaded by to modify my views on the issues. Thanks for casting things in that light.
Today’s large models are derived from large amounts of public data that the people that trained the models did not properly license.
They’re certainly prohibited by existing copyright law (there is at least one instance of copyright infringement in the ChatGPT training set, and there is no practical way to remove the infringing source data).
However, the courts have chosen not to enforce that part of the law.
So, one could easily argue that any model trained at that scale, by definition, only exists via a special grant (analogous to an easement, but non-exclusive) and is therefore in the public domain, and available for unrestricted use by the public.
In fact, there is case law around “sweat of the brow” works, like phone books, which are already treated with weaker copyright protection than other works. In particular, aggregating a pile of facts does not give you a copyright on the facts.