It seems that there is tremendous incentive for people like yourself (I see you're very active in these comments) to claim that. But I see you've presented no quantitative evidence. Given the politicization of the systems and individuals involved, without evidence, it all reads like partisan mud slinging.
Any LLM can be convinced to say just about anything. Pliny has shown that time and time again.
Does ChatGPT start ranting about Jews and "White Genocide" unprompted? How can I even quantify that it doesn't do that?
This is a classic "anything that can't be empirically measured is invalid and can be dismissed" mistake. It would be nice if we could easily empirically measure everything, but that's not how the world works.
The ChatGPT article is of a rather different nature where ChatGPT went off the rails after a long conversation with a troubled person. That's not good, but just no the same as "start spewing racism on unrelated questions".
I don't think I'm the one being presumptuous or demanding. I've actually tried to help you make a stronger argument. Shooting a hundred or even a thousand queries to 3 or 4 LLMs and shoving the results through established sentiment analysis algorithms is something ChatGPT can one-shot in just about any language. You demand people agree with your opinion and refuse to spend 20 minutes supporting it with facts. Not my problem, I tried to help. You may not see it that way. That's fine.
You can't just run a few queries and base conclusion off that, you need to run tens of thousands of different ones and then somehow evaluate the responses. It's a huge amount of work.
Demanding empirical data and then coming up with shoddy half-arsed methodology is unserious.