In theory you can prove a theorem just by enumerating all the possible proofs until you find the one for the theorem you want. This is extremely slow, but do you think there's any reasoning in doing this?
Of course we don't know whether an LLM is doing something like this or actually reasoning. But this is also the point, we don't know.
If you ask a question to a person you can be confident to some degree that they didn't memorize the answer beforehand, so you can evaluate their ability to "reason" and come up with an answer for it. With an LLM however this is increadibly hard to do, because they could have memorized it.
> In theory you can prove a theorem just by enumerating all the possible proofs until ...
An interesting hypothesis! I'm neither a mathematical logician, nor decently up to date in that field - is the possibility of this, at least in the abstract, currently accepted as fact?
(Yes, there's the perhaps-separate issue of only enumerating correct proofs.)
It depends on what theory you're working in (at which point deciding whether to use one theory or another becomes more like a phisolophical question).
I'm mostly familiar with type theory, of which there are many variants, but the most common ones all share the most important characteristics. In particular they identify theorems with types, and proofs with terms, where correct proofs are well-typed terms. The nice thing is that terms are recursively enumerable, so you can list all proofs. Moreover most type theories have decidable type checking, so you can automatically check whether a terms if well-typed (and hence the corresponding proof is correct).
This is not just theory, there exist already a bunch of tools that are being used in practice for mechanically checking mathematical proofs, like Coq, Lean, Agda and more.
When I said "in theory" however it's because in practice enumerating all proof terms will be very very slow and will take forever to reach proofs for theorems that we might find interesting.
Since we're in the LLM topic, there are efforts to use LLMs to speed up this search, though this is more similar to using them as search heuristics though. It does help though that you can have automatic feedback thanks to the aforementioned proof checking tools, meaning you don't need costly human supervision to train them. The hope would be getting something like what Stockfish/Alphazero is for chess.
Of course we don't know whether an LLM is doing something like this or actually reasoning. But this is also the point, we don't know.
If you ask a question to a person you can be confident to some degree that they didn't memorize the answer beforehand, so you can evaluate their ability to "reason" and come up with an answer for it. With an LLM however this is increadibly hard to do, because they could have memorized it.