Your wish for a lawsuit misguidedly presupposes they would lose on the grounds of fairness. Being a hypocrite isn't a crime and neither is profiting off of it. Point to any data they trained on that was neither already public domain or they paid royalties for.
I am not presuming that they'd lose. The point is that a lawsuit may bring this issue to light and establish a legal precedence. If OpenAI is allowed to use potentially dubious data to train their models to profit, then others should be allowed to do the same with their models.
If OpenAI wins, and successfully manages to stop alpaca/gpt4all type datasets, then they should be counter-sued for doing the same with upstream data.
At the end of the day proving in a court of law whether a particular LLM was trained from an open ai model derived dataset may also present to be very challenging if not impossible.
I for one would love to see more alpaca like datasets being used for commercial applications.
You can only "establish legal precedence" when the law is ambiguous and thus in need of interpretation to begin with. Calling OpenAI's data / access thereof "dubious" is fundamentally an opinion you have about their methods, not a tangible wrong doing which you can sue over. Unless you believe they did not actually pay royalties on certain data they used (and can prove it). OpenAI has a slam dunk case no matter how hypocritical they are forced to appear in the process.
Situation is more nuanced than you are making it seem. It's difficult to prove whether OpenAI used copyright data, because of the nature of these models. But that applies to people "abusing" OpenAI's terms as well to produce datasets for fine tuning models.
"Public domain" does not mean "available for public consumption." If I publish a blog post, I hold the copyright on that blog post, and reproducing it is a violation of copyright, despite the fact that it was posted publicly.
Public domain means that something can be reproduced without violating copyright, because the work is effectively owned by the public.