This is still a hand-wavy argument, and I'm not fully in tune with the nuts-and-...

This is still a hand-wavy argument, and I'm not fully in tune with the nuts-and-bolts of the implementations of these tools (both in terms of the LLM themselves and the infrastructure on top of it), but here is the intuition I have for explaining why these kinds of hallucinations are likely to be endemic:

Essentially, what these tools seem to be doing is a two-leveled approach. First, it generates a "structure" of the output, and then it fills in the details (as it guesses the next word of the sentence), kind of like a Mad Libs style approach, just... a lot lot smarter than Mad Libs. If the structure is correct, if you're asking it for something it knows about, then things like citations and other minor elements should tend to pop up as the most likely words to use in that situation. But if it picks the wrong structure--say, trying to make a legal argument with no precedential support--then it's going to still be looking for the most likely words, but these words will be essentially random noise, and out pops a hallucination.

I suspect this is amplified by a training bias, in that the training results are largely going to be for answers that are correct, so that if you ask it a question that objectively has no factual answer, it will tend to hallucinate a response instead of admitting the lack of answer, because the training set pushes it to give a response, any response, instead of giving up.