Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

“LLMs lack an underlying model” is very obviously incorrect. LLMs have an underlying model of semantics as tokens embedded into a high-dimensional vector space.

The question is not whether or not they have any model at all, the question is whether the model they indisputably have (which is a model of language in terms of linear algebra) maps onto a model of the external universe (a “world model”) that emerges during training.

This is pretty much an unfalsifiable question as far as I can see. There has been research that aims to show this one way or another and it doesn’t settle the question of what a “world model” even means if you permit a “world model” to mean anything other than “thinks like we do”.

For example, LLMs have been shown to produce code that can make graphics somewhat in the style of famous modern artists (eg Kandinsky and Mondrian) but fail at object-stacking problems (“take a book, four wine glasses, a tennis ball, a laptop and a bottle and stack them in a stable arrangement”). Depending on the objects you choose the LLM either succeeds or fails (generally in a baffling way). So what does this mean? Clearly the model doesn’t “know” the shape of various 3-D objects (unless the problem is in their training set which it sometimes seems to be) but on the other hand seems to have shown some ability to pastiche certain visual styles. How is any of this conclusive? A baby doesn’t understand the 3-D world either. A toddler will try and fail to stack things in various ways. Are they showing the presence or lack of a world model? How do you tell?



I agree that it's probably unfalsifiable in the sense of proving it definitively based on something like static analysis of the model itself.

But that doesn't mean that we can't, in theory, give the LLM a battery of tests that it should perform well (though not perfectly) on if it has a world model, and poorly (though not fail totally) on if it doesn't.

It's inherently a probabilistic system, so testing it in a probabilistic manner seems perfectly apt. Again: no, this will not produce a definitive result, due to that probabilistic nature—but it can produce an indicative one, and running the same test on multiple related LLMs, or similar tests on the same LLM, should help to smooth out noise in the results.

(...of course, this only works if the tests are designed well, and I don't have enough specific understanding of LLMs to know how one would go about doing that in a rigorous manner!)


I don't think its nearly as cut-and-dry as that. Even if you tried to make tests to differentiate world-model from non-world-model, all you'd end up concluding is:

If the AI has a world model, its world-model doesn't have features that allow it to do what I tested for.


In theory, if you have some people who know what they're doing, they could design enough different kinds of world-model tests that they could significantly reduce the likelihood of the LLM having a world model.

I think I would probably word the distinction I would draw as "it is technically unfalsifiable, but it is not untestable."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: