Copyright is not for "documents", it is for works that have creativity in them. The legal bar for that level of creativity is low, so low that it is easy to come away thinking that anything that can be cast as a "document" must be copyrightable, but the bar is in fact not zero.
In particular, taking other documents and shoving them through a process that generates a lot of other numbers with no human or creative interaction is definitely something I'd be concerned the courts would judge as not sufficiently creative to be copyrightable. The process itself would certainly consist of copyrightable code, but the output doesn't necessarily. This would be somewhat similar to the observation that there is no copyright to be had in a big table of files and their MD5 hashes (or other hashes), such as a Linux distro might use for integrity checking. Lots of copyright in the original file contents, copyright available on the process for producing these tables, but the tables themselves would likely be ruled not itself copyrightable as there is no creativity in that output.
Note this also has absolutely nothing to do with the question of whether AI output is copyrightable, this is about the huge table of numbers that make up the neural net weights being copyrightable. (Though it would be sort of an interesting question for the legal system to grapple with as to how a non-copyrightable set of numbers could then produce something copyrightable. Call it a philosophical variation on the "copyright washing" argument; can copyright spring from a non-copyrightable source other than a human brain, thus somehow "flowing uphill"? Would a human brain be copyrightable? Stay tuned for those questions, I guess, or if not you, your grandchildren.)
Per your other comments, "work" is not the bar, "creativity" is. "Size" is not the bar either. Merely being a much larger table of numbers than a list of hashes or a phone book is not the question. No human is in that table of numbers creatively saying "no, wait, this neural weight should be -1.5 instead of 2.0 to produce this creative effect". No human is even capable of working in the medium of neural net weights in a creative manner.
If you want to go the "novel legal theory" route, you could play with claiming creativity in the selection of input material and claim the resulting neural weights has a copyright in compilation: https://en.wikipedia.org/wiki/Copyright_in_compilation That's a long way from a slam dunk though. Way out on a legal limb there. It isn't entirely clear to me what exact rights would result from such a claim either. It would be a landmark copyright court case for sure.
IANAL, but I suspect that the "novel legal theory" in your last paragraph would fail. It might succeed if you gave GPT a hand-curated list of materials; hoovering up the entire internet is not that.
In particular, taking other documents and shoving them through a process that generates a lot of other numbers with no human or creative interaction is definitely something I'd be concerned the courts would judge as not sufficiently creative to be copyrightable. The process itself would certainly consist of copyrightable code, but the output doesn't necessarily. This would be somewhat similar to the observation that there is no copyright to be had in a big table of files and their MD5 hashes (or other hashes), such as a Linux distro might use for integrity checking. Lots of copyright in the original file contents, copyright available on the process for producing these tables, but the tables themselves would likely be ruled not itself copyrightable as there is no creativity in that output.
Note this also has absolutely nothing to do with the question of whether AI output is copyrightable, this is about the huge table of numbers that make up the neural net weights being copyrightable. (Though it would be sort of an interesting question for the legal system to grapple with as to how a non-copyrightable set of numbers could then produce something copyrightable. Call it a philosophical variation on the "copyright washing" argument; can copyright spring from a non-copyrightable source other than a human brain, thus somehow "flowing uphill"? Would a human brain be copyrightable? Stay tuned for those questions, I guess, or if not you, your grandchildren.)
Per your other comments, "work" is not the bar, "creativity" is. "Size" is not the bar either. Merely being a much larger table of numbers than a list of hashes or a phone book is not the question. No human is in that table of numbers creatively saying "no, wait, this neural weight should be -1.5 instead of 2.0 to produce this creative effect". No human is even capable of working in the medium of neural net weights in a creative manner.
If you want to go the "novel legal theory" route, you could play with claiming creativity in the selection of input material and claim the resulting neural weights has a copyright in compilation: https://en.wikipedia.org/wiki/Copyright_in_compilation That's a long way from a slam dunk though. Way out on a legal limb there. It isn't entirely clear to me what exact rights would result from such a claim either. It would be a landmark copyright court case for sure.