seen similarly structured sentences But ChatGPT doesn't generalize the structure...

faizshah · on June 18, 2024

That’s right, if LLMs were really thinking/forming world models etc. we would expect them to be robust against word choice or phrasing. But in practice anyone using RAG can tell you that that is not the case.

I’m just a practitioner so my language might be imprecise but when I say similarly structured sentences what I mean is, and this is my interpretation based on my experience with using Agents and LLMs, that the shape of the context as in the phrasing and the word choice highly bias the outputs of LLMs.

In my own observations at work, those who interpret LLMs to be thinking often produce bad agents. LLM are not good at open ended questions, if you ask an LLM “improve this code” you will often get bad results that just look passable. But if you interpret LLMs as probabilistic models highly biased by their context then you would add a lot more context and specific instructions in the prompt in order to get the Agent to produce the right output.

Side note, this is also why I like the AICI approach: https://github.com/microsoft/aici A lot of us think it is silly how the phrasing and word choice can produce dramatically different results in RAG applications. If you could run a program (like AICI) that post processes the output and picks the next word in a more structured way instead of writing more creative prompts that just makes a lot more sense to me.