That's a good point. It may be possible to directly upload predicate-type info into a LLM. This could be especially useful if you need to encode tabular data.
Somewhere, someone probably read this and is thinking about how to export Excel or databases to an LLM.
It's encouraging to see people looking inside the black box successfully.
The other big result in this area was that paper which found a representation of a game board inside a LLM after the LLM had trained to play a game. Any other good results in that area?
The authors point out that LLMs are doing more than encoding predicate-type info. That's just part of what they are doing.
The opposite is also exciting: build a loss function that punishes models for storing knowledge. One of the issues of current models is that they seem to favor lookup over reasoning. If we can punish models (during training) for remembering that might cause them to become better at inference and logic instead.
I believe it will add some spice to the model, but you shouldn't go too far at that direction. Any social system has a rule set, which has to be learnt and remembered, not infered.
Two exmaples. (1) grammars in natural languages. You can just see in another commenter here uses "a local maxima", and then how people react to that. I didn't even notice becuase English grammar has never been native to me. (2) Mostly, prepositions between two languages, no matter how close they are, don't have a direct mapping. The learner just has to remember it.
Interesting. Reminds me of a sci-fi short i read years ago where AI's "went insane" when they had too much knowledge because they'd spent too much time looking through data and get a buffer overflow.
I know some of the smaller models like PHI-2 are training for reasoning specifically before by training on question answer sets, though this seems like the opposite to me.
It indeed is. An attention mechanism's key and value matrices grow linearly with context length. With PagedAttention[1], we could imagine an external service providing context. The hard part is the how, of course. We can't load our entire database in every conversation, and I suspect there are also challenges around training (perhaps addressed via LandmarkAttention[2]) and building a service efficiently retrieve additional key-value matrices.
The external service vector database may require tight timings necessary to avoid stalling LLMs. To manage 20-50 tokens/sec, answers must arrive within 50-20ms.
And we cannot do this in real-time, pausing the transformer when a layer produces a query vector stalls the batch, so we need a way to predict queries (or embeddings) several tokens ahead of where they'd be useful and inject the context in when it's needed, and to know when to page it out.
It's encouraging to see people looking inside the black box successfully. The other big result in this area was that paper which found a representation of a game board inside a LLM after the LLM had trained to play a game. Any other good results in that area?
The authors point out that LLMs are doing more than encoding predicate-type info. That's just part of what they are doing.