> ToM is about being able to model the internal beliefs/desires etc of another p...

dauhak · 2025-02-20T13:54:27 1740059667

Do you work in ML research on LLMs? I do, and I don't understand why people are so unbelievable confident they understand how AI and human brains work such that they can definitely tell what functions of the brain LLMs can also perform. Like, you seem to know more than leading neuroscientists, ML researchers, and philosophers, so maybe you should consider a career change. You should maybe also look into the field of mechanistic interpretability, where lots of research has been done on internal representations these models form - it turns out, to predict text really really well, building an internal model of the underlying distribution works really well

If you can rigorously state what "having a world model" consists of and what - exactly - about a transformer architecture precludes it from having one I'd be all ears. As would the academic community, it'd be a groundbreaking paper.

Arkhaine_kupo · 2025-02-20T15:41:43 1740066103

This prety much seems to boil down to "brain science is really hard so as long as you dont have all the answers then AI is maybe half way there is a valid hypothesis". As more is understood about the brain and more about the limitations of LLMs arch then the distance only grows. Its like the God of the gaps where god is an answer for anythign science cant explain, ever shrinking, but with the LLM ability to have capabilities beyond striking statistical accuracy and local coherance.

You dont need to be unbelievably confident or understand exactly how AI and human brains work to make certain assesments. I have a limited understanding of biology, I can however make an assesment on who is healthier between a 20 year old person who is active and has a healthy diet compared to someone with a sedentary lifestyle, in their late 90s and with a poor diet. This is an assesement we can do despite the massive gaps we have in terms of understanding aging, diet, activity and overall health impact of individual actions.

Similarly, despite my limited understanding of space flight, I know Apollo 13 cannot cook an egg or recite french poetry. Despite the unfathamobly cool science inside the space craft, it cannot, by design do those things.

> the field of mechanistic interpretability

The field is cool, but it cannot prove its own assumption yet. The field is trying to prove you can reverse engineer a model to be humanly understood. Their assumptions such as mapping specific weights or neurons to features has failed to be reproduced multiple times, with the weight effects being way more distributed and complicated than initially thought. This is specially true for things that are equally mystified as the emergent abilities of LLMs. The ability of mimicking nuanced language being unlocked after a critical mass of parameters, does not create a rule as for which increased parameterisation will increase linerly or exponentially the abilities of an LLM.

> it turns out, to predict text really really well, building an internal model of the underlying distribution works really well

yeah, an internal model works well because most words are related to their neighbours, thats the kind of local coherance the model excels at. But to build a world model, the kind a human mind interacts with, you need a few features that remain elusive (some might argue impossible to achieve) to a transformer architecture.

Think of games like chess, an llm is capable of accurately expressing responses that sound like game moves, but the second the game falls outside its context window the moves become incoherent (while still sounding plausible).

You can fix this, with arch that do not have a transformer model underlying it, or by having multiple agents performing different tasks inside your arch, or by "cheating" and using a state outside the llm response to keep track of context beyond reasonable windows. Those are "solutions" but all just kinda prove the transformer lacks that ability.

Other tests abour casuality, or reacting to novel data (robustness), multi step processes and counterfactual reasoning are all the kind of tasks transformers still (and probably always) will have trouble with.

For a tech that is so "transparent" in its mistakes, and so "simple" in its design (replacing the convolutions with an attention transformer, its genius) I still think its talked about in borderline mystic tones, invoking philosophy and theology, and a hope for AGI that the tech itself does not lend to beyond the fast growth and surprisingly good results with little prompt engineering.