0. There already are similar lawsuits against OpenAI https://www.documentcloud.o...

aprilthird2021 · 2025-01-12T22:23:28 1736720608

Your points make sense. Thanks.

For point 1, there are employees at OpenAI who do know the provenance of the datasets used and I am sure (based on my experiences) that it includes knowingly downloaded and inserted copyrighted works. Is not one single employee of OpenAI willing to blow the whistle?

dialup_sounds · 2025-01-13T04:34:56 1736742896

Honestly, I doubt anybody cares that much. It's pretty much an open secret predicated on the untested idea that training is fair use, and the stakes aren't really that high in the long run even if it's not.

Losses for the tech companies will just mean training data gets more expensive. They're all spending tens of billions on new data centers, so there's not even a question of whether they can afford it.

aprilthird2021 · 2025-01-13T05:28:25 1736746105

Yes, that's all true. Plus these days the datasets are sanitized from copyrighted works