My understanding is the base model is pretty good about knowing whether it knows stuff or not. it's human feedback training that causes it to lose that signal.
Thanks, but I didn't find any details about performance of pre reinforcement training and after. Looking to understand more about the assertion that hallucinations are introduced by the reinforcement training.
https://arxiv.org/abs/2303.08774
The technical report has before and after comparisons. It's a bit worse on some tests. and they pretty explicitly mention the issue of calibration (how well confidence on a problem results in the ability or accuracy solving that problem).