My guess (and this is only a guess) is that LLMs hallucinate on legal questions because they expect things to make sense, but the law does not make sense. Lawyers have a vested interest in keeping the law an indecipherable and self-contradicting mess so that people have to pay them to get answers and help. This is the same reason (corrupt) judges treat attorneys better than people who represent themselves and refuse to point out what the law says in court.
LLMs hallucinate on legal questions because they hallucinate on everything.
Hallucination isn't a special weird form of error: it's the entire process by which LLMs work. Proponents and enthusiasts just call the errors "hallucinations" because it sounds better than admitting "this technology makes no distinction between correct and incorrect, and will often incorrectly correct itself when an error is pointed out".
It's probably best to take a step back and ask what you actually want an LLM to do for you. Using RAG and just having it raise references to consider is probably the best you can expect.
I have sometimes asked legal question (to which I pretty much know the answer) of LLMs and my consideration is that a lot of the time the section of an Act (UK) is the principle node about which a piece of information is anchored, but the anchoring is very loose and many sections are often mentioned in proximity, leading to poor differentiation of the relevance of a document/sentence/word vector to a particular piece of legislation. It might be fixed by training NER to recognise treaties/acts/conventions and the jurisdiction and always using that to label references; I suspect the "1" in "section 1" or "17 USC 1", say, is not being tokenised as part of the "section" and this contributes to poor results. Maybe someone has worked on this?
Also, context for the jurisdiction in which a discussion is taking place is often not really given - can the LLM tell that a law firm is talking about PCT, or EPC rather than USC when it discusses first-to-file for patent law and nothing in the document itself mentions the jurisdiction or any of those three initialisms? How about when the same law firm whose blog is mentioning these things represents the same clients at WIPO, EPO and USPTO? If you're going to fine-tune for that with human question-answer sessions you're going to need some really skilled workers who know the field well.
You probably then need specific prompt templates for legal questions too.
Then they need to layer in precedence and authority of different courts, recognise obiter statements from other statements, recognise summaries of positions of different parties aren't original statements by the speaker, disregard ultra vires statements, ... simples.
People's hatred for lawyers isn't going to distract them from their hatred of LLMs it seems, so you get downvoted. People's context window is as short as an LLMs lmao