Well if that is the metric, then reasoning has long been solved:
GNU Prolog 1.5.0 (64 bits)
Compiled Jul 8 2021, 09:35:47 with gcc
Copyright (C) 1999-2022 Daniel Diaz
| ?- [user].
compiling user for byte code...
blub << forp.
forp << tworby.
Forp << tworby :- Forp << forp.
user compiled, 3 lines read - 401 bytes written, 6194 ms
yes
| ?- findall(X, X << tworby, X).
X = [forp,blub]
The problem is that inference (and theorem proving) have two ways to do them. Either you memorize the reduction rules, or you deal with the combinatorial explosion. The former is Prolog and the latter is SAT/SMT solvers. People seem to expect that neural networks predict what the result would be if inference had been done - without actually doing the inference. It's possible to exploit local features, but not to skip it entirely in general. Note that inference can use a lot of memory/scratch space also. At that point, why not just use an external tool? I'd seem much smarter if I could query Prolog directly from my brain. Hell I'd sell my left arm to be able to do that.
Also, note that those statements are not hygienic, and that it assumes a certain logical interpretation of the sentences that isn't universal. We can also ask annoying questions like: is 'all' intensional or extensional? If I invented a new thing called swerb, and swerb is a forp now. Is it retroactively a tworby because the definition of being a forp means it is a tworby, or is it just that at the point in time of the original assertion all forps were tworbys (so the swerb wouldn't be)? There are no good ways to resolve this without back and forth and contextual guessing, or using formal languages.
Since there is no One True Logic, the common kernel of reasoning might as well be computation itself.
I think you're missing the point. Of course prolog can reason better than humans, that's what it was designed to do (deterministically).
The point is not to solve reasoning. The question is, can LLMs reason?
LLMs were not designed to reason, reasoning in an LLM is emergent. That should be interesting.
It should also be exciting because the domain over which LLMs can reason is much more unbounded than the domain over which prolog can reason (tokens and relationships you've already supplied it)
Also, note that those statements are not hygienic, and that it assumes a certain logical interpretation of the sentences that isn't universal. We can also ask annoying questions like: is 'all' intensional or extensional? If I invented a new thing called swerb, and swerb is a forp now. Is it retroactively a tworby because the definition of being a forp means it is a tworby, or is it just that at the point in time of the original assertion all forps were tworbys (so the swerb wouldn't be)? There are no good ways to resolve this without back and forth and contextual guessing, or using formal languages.
Since there is no One True Logic, the common kernel of reasoning might as well be computation itself.