The fact it (is suggested / we are led to believe / was recently imlied ) the neurons can be explained to be doing something like it on the underlying layer still says little about the process of forming ontological context needed for any kind of syllogism.
Observations are anecdotal. Since a lot of LLMs are non deterministic due to their sampling step, you could give rhe same survey to the same LLM many times and receive different results.
And we don’t have a good measure for emergent intelligence, so I would take any “study” with a large grain of salt.
I’ve read one or two arxiv papers suggesting reasoning capabilities, but they were not reproduced and I personally couldn’t reproduce their results.
Go back to the ReAct paper, reasoning and action. This is the basis of most of the modern stuff. Read the paper carefully, and reproduce it. I have done so, this is doable. The paper and the papers it refers to directly addresses many things you have said in these threads. For example, the stochastic nature of LLM’s is discussed at length with the CoT-SC paper (chain of thought self consistency). When you’re done with that take a look at the Reflexion paper.
To me it feels that whatever 'proof' you give that LLMs have a model in behind, other than 'next token prediction', it would not make a difference for people not 'believing' that. I see this happening over and over on HN.
We don't know how reasoning emerges in humans. I'm pretty sure the multi-model-ness helps, but it is not needed for reasoning, because they imply other forms of input, hence just more (be it somewhat different) input. A blind person can still form an 'image'.
In the same sense, we don't know how reasoning emerges in LLMs. For me the evidence lays in the results, rather than in how it works. For me the results are enough of an evidence.
The argument isn't that there is something more than next token prediction happening.
The argument is that next token prediction does not imply an upper bound on intelligence, because an improved next token prediction will pull increasingly more of the world that is described in the training data into itself.
> The argument isn't that there is something more than next token prediction happening.
> The argument is that next token prediction does not imply an upper bound on intelligence, because an improved next token prediction will pull increasingly more of the world that is described in the training data into itself.
Well said! There's a philosophical rift appearing in the tech community over this issue semi-neatly dividing people between naysayers, "disbelievers" and believers over this very issue.
I fully agree. Some people fully disagree though on the 'pull of the world' part, let alone 'intelligence' part, which are in fact impossible to define.
The reasoning emerges from the long distance relations between words picked up by the parallel nature of the transformers. It's why they were so much more performant than earlier RNNs and LSTMs which were using similar tokenization.
As you have said, neither you nor the other guy is qualified to make a technical assesment on this and we just have to wait for yhe investigations to play out.
However, what we can assess is the ex-employee's actions and multitude of claims, which while in isolation seems benign, together makes her look more like a vengeful ex-employee than a credible victim
I do not think that is reasonable to believe that because a report has some mundane findings that it indicates that the whole report should be dismissed.
i think large chain of thought LLMs (e.g. o3) might be able to manage