Illiterate humans can come up with new words like that too without being able to spell, LLMs are modeling language without precisely modeling spelling.
The tokenizer system supports virtually any input text that you want, so it follows that it also allows virtually any output text. It isn’t limited to a dictionary of the 1000 most common words or something.
There are tokens for individual letters, but the model is not trained on text written with individual tokens per letter, it is trained on text that has been converted into as few tokens as possible. Just like you would get very confused if someone started spelling out entire sentences as they spoke to you, expecting you to reconstruct the words from the individual spoken letters, these LLMs also would perform terribly if you tried to send them individual tokens per letter of input (instead of the current tokenizer scheme that they were trained on).
Even though you might write a message to an LLM, it is better to think of that as speaking to the LLM. The LLM is effectively hearing words, not reading letters.