The main issue is that they've had plenty of instances where the LLM outputted copyrighted content verbatim, like it happened with the New York Times and some book authors. And then there's DALL-E, which is baked into ChatGPT and before all the guardrails came up, was clearly trained on copyrighted content to the point it had people's watermarks, as well as their styles, just like Stable Diffusion mixes can do (if you don't prompt it out).
Like you've put, it's still a somewhat gray area, and I personally have nothing against them (or anyone else) using copyrighted content to train models.
I do find it annoying that they're so closed-off about their tech when it's built on the shoulders of openness and other people's hard work. And then they turn around and throw Issy fits when someone copies their homework, allegedly.
Like you've put, it's still a somewhat gray area, and I personally have nothing against them (or anyone else) using copyrighted content to train models.
I do find it annoying that they're so closed-off about their tech when it's built on the shoulders of openness and other people's hard work. And then they turn around and throw Issy fits when someone copies their homework, allegedly.