I think this is a failure in how we fine-tuned and evaluated them in RLHF.
"In theory, the human labeler can include all the context they know with each prompt to teach the model to use only the existing knowledge. However, this is impossible in practice." [1] Therefore causing and forcing some connections that are not all there for the LLM. Extrapolate that across various subjects and types of queries and there you go.
"In theory, the human labeler can include all the context they know with each prompt to teach the model to use only the existing knowledge. However, this is impossible in practice." [1] Therefore causing and forcing some connections that are not all there for the LLM. Extrapolate that across various subjects and types of queries and there you go.
1:https://huyenchip.com/2023/05/02/rlhf.html