Perhaps what we should be pushing for is a law that would force full disclosure ...

Perhaps what we should be pushing for is a law that would force full disclosure regarding the training corpus and require a curated version of the training data to be made available. I'm sure there would be all kinds of unintended consequences of a law like that but maybe we'd be better off starting from a strong basis and working out those exceptions. While billions have been spent to train these models, the value of the millions of human hours spent creating the content they're trained on should likewise be recognized.