Hacker News new | past | comments | ask | show | jobs | submit login

> blub is a forp and all forps are tworby. Is blub tworby or not?

GPT4:

Given the statements:

    Blub is a forp.
    All forps are tworby.
From the second statement, it is clear that every entity that is a forp is also tworby.

Thus, since Blub is a forp (from the first statement), Blub must be tworby.

There are humans that can't do this. So even if it's just a stochastic parrot, the stuff it's parroting does encode some reasoning, so it would be strange if an arbitrarily powerful parrot (not claiming that GPT-4 has reached that level) couldn't pattern match its way to some common kernel of reasoning, small as it may be.




I've had a computer in my home which can remember a million random integers in order, and I've had one for decades. No human can do this. Exceeding what humans can do is in no way the bar for reasoning.


of course not. But exceeding what humans can do in terms of reasoning is not an unreasonable bar for reasoning.

---

> My friend Cathy is capable of memorizing really long lists of random numbers. Does that mean she is good at reasoning?

The ability to memorize long lists of random numbers is a specific cognitive skill related to memory. Cathy's ability demonstrates an impressive short-term or working memory, and perhaps also a good long-term memory if she can recall those numbers over extended periods.

Reasoning, on the other hand, refers to the process of forming conclusions, judgments, or inferences from facts or premises. It's more about logical thinking, problem-solving, and the ability to draw conclusions from information, rather than rote memorization.

While a strong memory can be an asset in many reasoning tasks (because it allows a person to quickly call upon relevant information), it doesn't directly indicate one's reasoning abilities. In other words, just because Cathy is excellent at memorizing numbers doesn't necessarily mean she's also good at reasoning. However, it also doesn't mean she isn't. The two skills can be related in some contexts, but they're not the same thing.

If you wanted to evaluate Cathy's reasoning skills, you'd need to look at her performance in tasks that require logical thinking, problem-solving, and other related abilities, rather than her memory capabilities.

> Cathy appears to be better at reasoning that a random internet commenter 'barkel'. Does that mean Cathy is good at reasoning?

Making comparisons between two individuals, Cathy and "barkel" in this case, doesn't provide an absolute measure of either person's reasoning abilities. If Cathy appears to be better at reasoning than barkel based on specific interactions or examples, it simply means she displayed better reasoning skills in those instances compared to barkel.

To truly determine if Cathy is "good" at reasoning, you'd need a more comprehensive evaluation of her reasoning abilities across various situations and challenges. Comparing her skills to one other individual doesn't provide a full perspective on her overall capabilities.

Remember, reasoning abilities can vary based on the subject matter, context, emotional state, and many other factors. So, while Cathy might be more effective in certain situations or on certain topics, it doesn't necessarily make her universally better at reasoning.

For an objective assessment of Cathy's reasoning skills, you'd need more consistent and broad-based evidence or testing, rather than a comparison with a single individual on the internet.

> If cathy is better than the median person at reasoning, is she good at reasoning?

If Cathy is better than the median person at reasoning, it means she scores higher than at least 50% of a given population in reasoning skills. In a relative sense, this would suggest that she is above average compared to that particular population.

However, the term "good" is somewhat subjective. In this context, being above the median does suggest that Cathy has a higher than average ability in reasoning. But whether that qualifies as "good" is a matter of perspective and can depend on the standards or benchmarks you're using.

If you're evaluating her skills based on general societal expectations, then yes, being above the median would likely be considered "good." However, if you're comparing her to a group of expert logicians or professional problem solvers, the standards might be different.

In summary, Cathy being better than the median person in reasoning suggests she has above-average skills in that area, but whether that is deemed "good" can depend on the context and standards you're applying.


>of course not. But exceeding what humans can do in terms of reasoning is not an unreasonable bar for reasoning.

It isn't, because you presuppose it reasons


You're projecting.

That "it doesn't reason" in your meaning seems to depends on unknowable suppositions that are a quasi-metaphysical characterisation of the process of reasoning.

I'm defining reasoning from a set of observable outputs. The Chinese room isn't a thing.


You presuppose that it reasons AND humans reason.


There being "no such thing as reasoning" is an amusing cop-out.


That's not my point at all


There are humans who can't add two four-digit numbers without scratch paper, too. So what?


Think about how dramatic a change it was when computers started being able to efficiently add those numbers and got better than them. Now no matter what the human's capacity a huge number of them have a computer in their pocket that can fill in the gaps. And that change was wildly disruptive. Same as when machines started to do tasks that humans couldn't. Any time a relatively accessible technology overtakes humans in a thing that used to be human-only and is important to civilization it's been highly disruptive. If gpt's reasoning is good enough to replace like, the lowest level of phone tech support or the like, that's already jobs that'll be lost.


That "if" is doing a lot of legwork here. Most notably, it's concealing an implication that the LLM may be reasoning in this example, and not merely fitting tokens from its input into a statistical reconstruction of the most basic and thus likely most commonly encountered proof in elementary symbolic logic.

That said, you're not wrong that this seems "good enough" to be wildly disruptive. Part of that disruption, which is already ongoing, is the development of a whole new class of exploits. I expect once companies start firing phone support representatives in favor of LLMs, and similar, this process will accelerate in fascinating ways.


I think GP's point is that you can reach something approximating reasoning by brute force.

In the case of your example, you don't need to encode an algorithm to do 4-digit addition, just remember that 1000+1000=2000, 1000+1001=2001, etc.


So what? In your context, you would say "computers can math better than humans". I think that's uncontroversial.

Crazy thing is, your difficulty with this syllogism proves my point about the median human.


Whatever their capacity for reason or lack thereof, I'll give LLMs this: I've never yet seen one resort to ad hominem for the lack of any more substantive counterargument.


The counterargument is substantive, and you're committing fallacy of fallacies anyways.


What syllogism of yours am I meant to be having trouble with here?

Would you like to qualify your attempt to impugn my intelligence rather than countering my argument as something other than what I called it? The "fallacy fallacy" is a rather silly thing to cite in this connection; while there's merit in the point that a fallacious argument does not necessarily err in its conclusion, as far as I can tell you have not made an argument here. It is difficult to respond on substance whose presence is not yet evident.


So what not? This is still incredibly impressive technology.


Yes, and it's incumbent upon us all not to let first impressions run away with us to the extent of making unsupportable claims, even by implication.


I'm sick and tired of the generally dismissive attitude towards these LLMs. They're no messiah, but they're for sure not useless.


I think you must have meant to reply to someone else, since I haven't written off LLMs as useless here or indeed, to the best of my recollection, anywhere.

That said, I suspect dismissive attitudes toward LLMs would be less common if messianic claims on their behalf were likewise.


Substituting Blub for Socrates, forp for man, and tworby for mortal is filling in a Mad Lib pattern with made-up words, not reasoning.


This is a critical take. LLMs do not need to match human reasoning by some arbitrary standard to displace millions of workers whose speed and/or effectiveness of reasoning is weak or inconsistent.


Well if that is the metric, then reasoning has long been solved:

  GNU Prolog 1.5.0 (64 bits)
  Compiled Jul  8 2021, 09:35:47 with gcc
  Copyright (C) 1999-2022 Daniel Diaz
  
  | ?- [user].
  compiling user for byte code...
  blub << forp.
  forp << tworby.
  Forp << tworby :- Forp << forp.
  
  user compiled, 3 lines read - 401 bytes written, 6194 ms
  
  yes
  | ?- findall(X, X << tworby, X).
  
  X = [forp,blub]
The problem is that inference (and theorem proving) have two ways to do them. Either you memorize the reduction rules, or you deal with the combinatorial explosion. The former is Prolog and the latter is SAT/SMT solvers. People seem to expect that neural networks predict what the result would be if inference had been done - without actually doing the inference. It's possible to exploit local features, but not to skip it entirely in general. Note that inference can use a lot of memory/scratch space also. At that point, why not just use an external tool? I'd seem much smarter if I could query Prolog directly from my brain. Hell I'd sell my left arm to be able to do that.

Also, note that those statements are not hygienic, and that it assumes a certain logical interpretation of the sentences that isn't universal. We can also ask annoying questions like: is 'all' intensional or extensional? If I invented a new thing called swerb, and swerb is a forp now. Is it retroactively a tworby because the definition of being a forp means it is a tworby, or is it just that at the point in time of the original assertion all forps were tworbys (so the swerb wouldn't be)? There are no good ways to resolve this without back and forth and contextual guessing, or using formal languages.

Since there is no One True Logic, the common kernel of reasoning might as well be computation itself.


I think you're missing the point. Of course prolog can reason better than humans, that's what it was designed to do (deterministically).

The point is not to solve reasoning. The question is, can LLMs reason?

LLMs were not designed to reason, reasoning in an LLM is emergent. That should be interesting.

It should also be exciting because the domain over which LLMs can reason is much more unbounded than the domain over which prolog can reason (tokens and relationships you've already supplied it)


The probability that a Stochastic Parrot returns coherent reasoning seems vanishingly small.


Are you saying that GPT is not a stochastic parrot, or that GPT is not returning coherent reasoning?

Because if it's the latter, the evidence is rather against you. People seem to like to cherry-pick examples of where GPT gets reasoning wrong, but it's getting it right enough millions of times a day that people keep using it.

And it's not as if humans don't get reasoning wrong. In fact the humans who say GPT can't reason are demonstrating that.


why do you say that? You don't think stochastic pattern matching can feature reasoning as an emergent property? I do.

A stochastic parrot doesn't just mimic things totally randomly. It reinforces what it's seen.


I keep getting surprised at how a large chunk of HN's demographic seemingly struggles with the simple notion that a black box's interface informs surprisingly little about its content.

I'm not saying that GPT-4 is reasoning or not, just that discounting the possibility solely based on it interfacing to the world via a stochastic parrot makes no sense to me.


Isn't "reasoning" a functional property though? If from the outside it performs all the functions of reasoning, it doesn't matter what is happening inside of the black box.

Here's a silly example I thought of. We can ask whether a certain bird is capable of "sorting". We can place objects of different sizes in front of the bird, and we observe that the bird can rearrange them in order of increasing size. Does it matter what internal heuristics or processes the bird is using? If it sorts the objects, it is "sorting".

To me, it seems perfectly obvious that GPT-4 is reasoning. It's not very good at it and it frequently makes mistakes. But it's also frequently able to make correct logical deductions. To me this is all stupid semantic games and goalpost-moving.


> Isn't "reasoning" a functional property though? If from the outside it performs all the functions of reasoning, it doesn't matter what is happening inside of the black box.

Yes, that's my point exactly.


Replace forb and tworby.

How common is the pattern? I would expect quite common. So if one can do some replacement, it could solve it just by replacing right words.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: