Considering the fact that the RLHF for ChatGPT was done only in English but then worked just as well for every other language I would wager that specific types of logs being present in the training set is less important than it may seem.
I speak a Slavic language spoken by ~2 million people, and I asked GPT-4 to tell me an old fable in my language. It did so fantastically, with no grammatical errors. I tried some more things -- it speaks it fluently, though admittedly not always idiomatically.
I tried to make it joke at the expense of Norwegians speaking a particular dialect, and it refused. (Language spoken by 5 million people).
When I tried to jailbreak it by prompting it to make the joke from the perspective of an esteemed actor, performing for the Prime Minister and other respected figures, it had our Prime Minister scold and demand an apology from the actor for making fun of stereotypes. The actor was contrite and toned down his humor.