We are pretty certain that humans can reason, yet they are sometimes wrong. Even...

roughly · 2025-01-01T21:03:44 1735765424

The difference is we _know_ that LLMs are fancy stochastic models, we don't know that they're capable of reasoning, and the null hypothesis is that they're not (because we know what they _are_ - we built them) - any "reasoning" is an emergent property of the system, not something we built them to do. In that case, evidence they're not reasoning - evidence they're stochastic parrots doing a performance of reasoning - weighs heavier, because the performance of reasoning fits into what we know they can do, whereas genuine reasoning would be something new to the model.

There's deeper philosophical questions about what reasoning actually _is_, and LLMs have made those sharper, because they've shown it's clearly possible for a complex statistical model to generate words that look like reasoning, but the question is whether there's a difference between what they're doing and what humans are doing, and evidence that they're _not_ reasoning - evidence that they're just generating words in specific orders - weighs heavily against them.

wongarsu · 2025-01-01T21:22:59 1735766579

We haven't coded LLMs to be stochastic models, we coded them to predict text with any method gradient decent finds on a transformer architecture. That's not exactly the same.

But more importantly, if you want to show that LLMs can't reason you obviously have to use a test that when applied to humans would show that humans can reason. Otherwise your test isn't testing reasoning but something more strict.

Clubber · 2025-01-02T02:19:28 1735784368

>we don't know that they're capable of reasoning

Apple AI researchers released a paper on it. They say no.

https://arxiv.org/pdf/2410.05229

Isinlor · 2025-01-01T22:30:18 1735770618

It's widely accepted that reasoning is not a binary skill.

You can make mistakes and still reason. Very often people given the same premises will disagree in thier reasoning as we are doing right here.