I would counterargue with "that's the model's problem, not mine".
Here's a thought experiment: if I gave you 5 boxes and told you "how many balls are there in all of this boxes?" and you answered "I don't know because they are inside boxes", that's a fail. A truly intelligent individual would open them and look inside.
A truly intelligent model would (say) retokenize the word into its individual letters (which I'm optimistic they can) and then would count those. The fact that models cannot do this is proof that they lack some basic building blocks for intelligence. Model designers don't get to argue "we are human-like except in the tasks where we are not".
Of course they lack building blocks for full intelligence. They are good at certain tasks, and counting letters is emphatically not one of them. They should be tested and compared on the kind of tasks they're fit for, and so the kind of tasks they will be used in solving, not tasks for which they would be misemployed to begin with.
I agree with you, but that's not what the post claims. From the article:
"A significant effort was also devoted to enhancing the model’s reasoning capabilities. (...) the new Mistral Large 2 is trained to acknowledge when it cannot find solutions or does not have sufficient information to provide a confident answer."
Words like "reasoning capabilities" and "acknowledge when it does not have enough information" have meanings. If Mistral doesn't add footnotes to those assertions then, IMO, they don't get to backtrack when simple examples show the opposite.
Its not like an LLM is released with a hit list of "these are the tasks I really suck at." Right now users have to figure it out on the fly or have a deep understanding of how tokenizers work.
That doesn't even take into account what OpenAI has typically done to intercept queries and cover the shortcomings of LLMs. It would be useful if each model did indeed come out with a chart covering what it cannot do and what it has been tailored to do above and beyond the average LLM.
Sure, if you want to go with wildly theoretical approaches, we can't even be sure if the rock on the ground doesn't have some form of intelligence.
Meanwhile, for practical purposes, there's little arrogance needed to say that some things are preconditions for any form of intelligence that's even remotely recognizable.
1) Learning needs to happen continuously. That's a no-go for now, maybe solvable.
2) Learning needs to require much less data. Very dubious without major breakthroughs, likely on the architectural level. (At which point it's not really an LLM any more, not in the current sense)
3) They need to adapt to novel situations, which requires 1&2 as preconditions.
3
4) There's a good chance intelligence requires embodiment. It's not proven, but it's likely. For one, without observing outcomes, they have little capability to self-improve their reasoning.
5) They lack long-term planning capacity. Again, reliant on memory, but also executive planning.
There's a whole bunch more. Yes, LLMs are absolutely amazing achievements. They are useful, they imply a lot of interesting things about the nature of language, but they aren't intelligent. And without modifying them to the extent that they aren't recognizably what we currently call LLMs, there won't be intelligence. Sure, we can have the ship of Theseus debate, but for practical purposes, nope, LLMs aren't intelligent.
4) 'Embodiment' is another term we don't really know how to define. At what point does an entity have a 'body' of the sort that supports 'intelligence'? If you want to stick with vague definitions, 'awareness' seems sufficient. Otherwise you will end up arguing about paralyzed people, Helen Keller, that rock opera by the Who about the pinball player, and so on.
5) OK, so the technology that dragged Lee Sedol up and down the goban lacks long-term planning capacity. Got it.
None of these criteria are up to the task of supporting or refuting something as vague as 'intelligence.' I almost think there has to be an element of competition involved. If you said that the development of true intelligence requires a self-directed purpose aimed at outcompeting other entities for resources, that would probably be harder to dismiss. Could also argue that an element of cooperation is needed, again serving the ultimate purpose of improving competitive fitness.
LLMs are not a tool to model intelligence. It's not a function of the dataset, they are, as is, not sufficient. One of the largest shortcomings being the lack of continuous learning, memory, and (likely) forgetting
Those who develop AI that know anything don't actually describe current technology as human like intelligence rather it is capable of many tasks which previously required human intelligence.
Here's a thought experiment: if I gave you 5 boxes and told you "how many balls are there in all of this boxes?" and you answered "I don't know because they are inside boxes", that's a fail. A truly intelligent individual would open them and look inside.
A truly intelligent model would (say) retokenize the word into its individual letters (which I'm optimistic they can) and then would count those. The fact that models cannot do this is proof that they lack some basic building blocks for intelligence. Model designers don't get to argue "we are human-like except in the tasks where we are not".