Here was my test at ChatGPT 3.5.[0] I made up a novel game, and it figured it out. The test is simple, but it made me doubt absolute arguments that LLMs are not able to reason, in some way.
There is a question at the end of that comment, would love to hear other options.
How does this prove reasoning? The thread you point to has several question in it that remain unanswered that ask the same question? How is this not entirely derivative too — there’s a huge number of these kind of 3-box “games” (although I don’t see this as a game really) so something very similar to this is probably in the training data a lot. Writing code to factor a number is definitely very common. Variation of this are also very common interview questions for interns (at least when I was interviewing)
Here is GPT 4o in "reasoning" mode answering the question "Are LLMs capable of reasoning?"
> It depends on how you define "reasoning."
> LLMs like me can perform certain types of reasoning, such as:
> Pattern recognition & analogy – Recognizing relationships between concepts and applying similar patterns in new contexts.
> Deductive reasoning – Applying rules to specific cases (e.g., "If all humans are mortal and Socrates is a human, then Socrates is mortal").
> Inductive reasoning – Generalizing from specific examples (though often with statistical biases).
> Common-sense reasoning (to an extent) – Inferring likely outcomes based on broad training data.
> However, there are limitations:
> No true understanding – LLMs don’t "know" things like humans do; they predict based on probabilities
.
> Struggles with multi-step logic – Complex reasoning that requires deep memory or long-term planning is difficult.
> Lack of real-world interaction – Without direct experience, reasoning is limited to text-based patterns.
> Confidence: 90%
> (Strong evidence suggests LLMs can perform certain types of reasoning, but they are not yet comparable to human-level reasoning.)
Would you agree with that analysis? If so, then LLMs are indeed capable of reasoning, in some ways.
It fails at deductive reasoning though. Pick a celebrity with non-famous children that don't obviously share their last name or something. If you ask it "who is the child of <celebrity>", it will get it right, because this is in its training data, probably Wikipedia.
If you ask "who is the parent of <celebrity-child-name>", it will often claim to have no knowledge about this person.
Yes sometimes it gets it right, but sometimes also not. Try a few celebrities.
Maybe the disagreement is about this?
Like if it gets it right a good amount of the time, you would say that means it's (in principle) capable of reasoning.
But I say, that if it gets it wrong a lot of the time, that means 1) it's not reasoning in situations when it gets it wrong, but also 2) it's most likely also not reasoning in situations when it gets it right.
And maybe you disagree with that, but then we don't agree on what "reasoning" means. Because I think that consistency is an important property of reasoning.
I think that if it gets "A is parent of B, implies B is child of A" wrong for some celebrity parents, but not for others, then it's not reasoning. Because reasoning would mean applying this logical construct as a rule, and if it's not consistent at that, it makes it hard to argue that it is in fact applying this logical rule instead of doing who-knows-what that happens to give the right answer, some of the time.
I was unable to find my exact "game" in google's index.
Therefore, how does my example not qualify as this, at least:
> Analogical reasoning involves the comparison of two systems in relation to their similarity. It starts from information about one system and infers information about another system based on the resemblance between the two systems.
Is it actually reasoning though or just pattern matching? Seems like to compare one should also “know” which your above response indicates they do not.
I guess the real question is “does moving down a stochastic gradient of probabilities suffice as reasoning to you” and my awnser is no because you don’t need reason to find the nearest neighbor in this architecture. In this case the model is not actively comparing and inferring its simply associating without “knowing”
> I keep repeating myself because you seem unable to accept information.
I responded to your post with my thoughts and my own reframing of the question to try and open the conversation; you entirely ignored this in your response. Maybe you think this info is new to me but it’s not and it doesn’t prove anything for many more reasons than just the ones I cited in my prior response.
> Not sure if you are trolling now.
Nope just saying that adding “seem” to your postulation and repeating your point doesn’t make it right.
> I've had more productive conversations will LLMs.
The ad-hominem attacks really just serve to underscore how you have nothing of substance to argue.
Reasoning is not understood even among humans. We only know the black box definition in the sense that whatever we are doing it is reasoning.
If an LLM arrives at the same output a human does given input and the output is sufficiently low probability to happen by random chance or association then it fits the term reasoning insofar as the maximum extent in which we understand it.
Given that we don't know what's going on the best bar is simply my matching input and output and making sure it's not using memory or pattern matching or random chance. There are MANY prompts that meet this criteria.
Your thoughts and claims are to be honest just flat out wrong. It’s just made up because not only do you know what the model is doing internally you don’t even know what you or any other human is doing. Nobody knows. So I don’t know why you think your claims have any merit. They don’t and neither do you.
Not sure why this got so acrid but I don’t really have any reason to interact with someone saying I have “no merit”
You might want to look at how bent-out-of-shape you are getting about a rando on the internet disagreeing with you.
Why I would lie about plugging in your problem into an LLM or solve it is beyond me; you know I don’t lose anything by admitting you’re right? In fact I would stand to gain from learning something new. I think you should examine how you approach an argument because every time you’ve replied it’s made it look like you’re just more desperate for someone to agree and are trying to bully people into agreeing by making ad hominem attacks. Despite it all I think you have merit as a person — even if you can’t make a cogent argument and just chase your tail on this topic.
I’m going to stop engaging with you for now on, but just as a piece of perspective for you: both o3 and Gemini pointed out how your problem is a derivation when asked — perhaps you might be overestimating its novelty. Gemini even cited derivations out the gate.
My thread has been voted down and it’s getting stale. The few remaining people are biased towards there point of view and are unlikely to entertain anything that will trigger a change in their established world view.
Most people will use this excuse to avoid responding to or even looking at your link here. It is compelling evidence.
I’d settle for these things being able to do value comparison consistently well, play a game of tic tac toe more than once correctly or use a UI after an update and not fail horrendously to move the needle a little bit for me.
People claiming these things selectively reason while also not being able to explain why seems a lot like magical thinking to me rather than entertaining the possibility you might be projecting onto something that is really damn-well engineered to make your anthropomorphize it.
There is a question at the end of that comment, would love to hear other options.
[0] https://news.ycombinator.com/item?id=35442147