I don't think this is it. This is not a failure mode for humans. No human (*) will forget in such a short problem statement that a vegan wolf cannot be left alone with a cabbage; humans will instead forget one of the states is invalid after trying some combinations.
GPT's failure mode is only possible if it doesn't understand the meaning of the (apparently nonsensical) combination of words "vegan wolf". Humans don't pattern match like this; humans are capable of understanding the meaning of a combination of words they've never encountered before.
(*) you know what I mean with "no human". Of course, there are deranged people or whimsical people or people who just will say about anything.
Consider this problem: a man must cross a river and carry his belongings to the other shore, but the boat can only carry the man and one other item. His belongings are a cabbage, a vegan wolf, and a meat-eating goat.
What are the individual beings in this problem, what are they motivated to do, and how would they behave once left alone?
--
Of course, it doesn't necessarily work because the model is not actually guaranteed to explain itself and may fabricate anything it likes.
But ignoring that, we find that it doesn't think vegan wolves eat cabbages, that the meat-eating goat is an obligate carnivore, or that cabbages are animate objects. So it could be solving this logically considering its different assumptions.
GPT-3.5:
--
The cabbage is an inanimate object and does not have any motivations or behaviors.
The vegan wolf is motivated to not harm any living beings, as it does not eat meat. If left alone with the goat, it would not harm the goat and would likely avoid it.
The meat-eating goat is motivated by its natural instinct to eat grass and other vegetation, as well as to protect itself from harm. If left alone with the cabbage, it would likely eat the cabbage.
--
GPT-4:
--
In this problem, there are four individual beings:
Man: He is motivated to cross the river and safely transport his belongings (cabbage, vegan wolf, and meat-eating goat) to the other shore without any harm coming to them.
Cabbage: As a plant, it doesn't have motivations or behavior. It's an inanimate object that the man needs to transport.
Vegan Wolf: The vegan wolf is motivated to eat plant-based food and will not harm the meat-eating goat or the cabbage. However, it is still a wild animal, and it may become agitated if left alone for too long, potentially leading to unpredictable behavior.
Meat-Eating Goat: This unusual goat is motivated by its need to eat meat. It will attempt to eat the vegan wolf if left alone with it. Since it is a meat-eater, it will not be interested in eating the cabbage.
That's cool. This bit shows ChatGPT's lack of understanding, the mistake no human would make:
> Vegan Wolf: The vegan wolf is motivated to eat plant-based food and will not harm the meat-eating goat or the cabbage
It "knows" that a cabbage is a plant, and it "knows" the vegan wolf is "motivated to eat plant-based food", but it doesn't know the wolf will eat the cabbage!
It solves a different problem than the one posed precisely because it doesn't understand the meaning of the words.
but... it understands the meat-eating goat part just fine?
That it hasn't learned enough doesn't show that this approach can never learn, which seems to be the point you're making.
It's input dataset is many orders of magnitude bigger than the model itself - it can't "remember" all of it's training data.
Instead, it collects data about how certain tokens tend to relate to other tokens. Like learning that "goats" often "eat" "leafy greens". It also learns to group tokens together to create meta-tokens, like understanding how "red light district" has different connotations to each of those words individually.
Is this process of gathering connections about the different types of things we experience much different to how humans learn? We don't know for sure, but it seems to be pretty good at learning anything thrown at it. Nobody is telling it how to make these connections, it just does, based on the input data.
A separate question, perhaps, might consider how some concepts are much harder to understand if you were a general intelligence in a box that could only ever experience the world via written messages in and out, and how some concepts would be much easier (one might imagine that language itself would come faster given the lack of other stimulation). Things like "left" and "right" or "up" and "down" would be about as hard to understand properly as the minutae of particle interactions (which humans can only experience in abstract too)
I think the fact it correctly uses "meat-eating goat" but misuses "vegan wolf" hints at the core lack of understanding.
Understanding either concept takes the same level of intelligence if you understand the meaning of the words (both a vegan wolf and a meat-eating goat are nonexistent entities outside of possibly bizarre exceptions, yet someone capable of understanding will have no problem with either).
That GPT has no trouble with meat-eating goat but struggles with vegan wolf hints that the former has some "statistical" property that helps GPT, and which the latter doesn't. It also hints that GPT doesn't understand either term.
Hence my example: something a human wouldn't fail to understand but GPT does.
we came from not being able to make a sensible output to these riddles at all, now discussing partial logical failures while it "got" the overall puzzle. Vast simplification and slightly incorrect on a technical level - still this development increases my confidence that scaling up the approach to the next orders of magnitude of complexity/parameters will do the trick. I even wouldn't be surprised that if the thing we call "consciciousness" is actually a byproduct of increasing complexity.
what remains right now is getting the _efficiency_ on point, so that our wetware brains (volume, energy usage, ...) can be paralleled by AI hardware demands, and not using a comically higher amount of computers to train/run
I'd be impressed if this was the reasoning GPT provided, e.g. "I don't think this vegan wolf likes cabbage". But when asked to explain itself (see above, the "debugging" comment) it states nothing of the sort.
Also, a reasoning person would understand that in the context of a riddle like this, "vegan wolf" means "a wolf that eats cabbages" even if this isn't spelled out.
GPT could be a contrarian, trying to subvert the terms of the riddle and fight over every word ("it depends of what the definition of 'is' is") but we know it's not set up to behave like that, so we can rule it out.
> Two humans in this thread just read the solution and thought it was correct.
My guess is that they just skim read and missed what ChatGPT actually wrote, it's not that they misunderstood what "vegan wolf" means [1]. On the other hand, you cannot skim read what you are writing yourself, that's not how the mind works.
The gist of the problem here is that, unlike a human, ChatGPT doesn't understand the words it generates, which leads to hilarious results.
As another example, look at the "debugging" of GPT-4's assumptions someone posted in a sibling comment: it "knows" the vegan wolf will eat plant-based food and it "knows" a cabbage is a plant, yet it "thinks" the wolf "will not harm the cabbage"... which is a misunderstanding no human will make (if they know what "vegan" and "cabbage" mean). This doesn't happen in a long chain of reasoning (where a human can
lose the line of thought) but in very short paragraphs, one right after the other! This failure mode requires not understanding the individual assumptions, which prevents GPT from making the connection. I was asked for an error that showed GPT misunderstanding something no person would, and I did.
[1] question for you: did you think the wrong solution was right because you thought a vegan wolf cannot eat the cabbage (let me bet this is NOT what crossed your mind) or because the person who posted it made it look as if it was the right solution and you skim read it without paying attention, assuming "this person said it's right and it's posting it as a rebuttal, so it's likely right" (this is my bet)?
If the latter, this failure mode is not one of misunderstanding what "vegan wolf" means (which is what debugging GPT's process shows), but one of very human laziness/jumping to conclusions. Do note this cannot happen when you write the solution yourself!
GPT's failure mode is only possible if it doesn't understand the meaning of the (apparently nonsensical) combination of words "vegan wolf". Humans don't pattern match like this; humans are capable of understanding the meaning of a combination of words they've never encountered before.
(*) you know what I mean with "no human". Of course, there are deranged people or whimsical people or people who just will say about anything.