I don't think it makes sense to compare human learning to GPT-3 learning: it's a fundamentally different process. Human brain doesn't get just tokens, but also other sensory data, particularly, visual.
So I don't think that you can conclude that humans learn more efficiency based on just quantity of data.
It's also worth noting that GPT-3 is trained to emulate _any_ human writing, not just some human's writing.
For an actual Turing test one might fine-tune it on text produced by one particular human, then you might get more accurate results.
> I don't think that you can conclude that humans learn more efficiency based on just quantity of data.
You are correct. My main argument was that in-distribution learning is not enough. You can's fix that problem with more data as many responses to my comment seem to assume.
I think out-distribution leaning and small data requirement are connected. If agent can understand the concept separate from the chain leading from sensory inputs, it can understand what it's doing in novel situation even without examples.
Elements used in GPT-3 are capable of transformations such as abstraction (e.g. separate the structure of syllogism from concrete nouns) and logic (ReLU can directly implement OR, AND, NOT which is sufficient to do arbitrary logic).
We can see that it actually uses abstractions and logic in some cases.
E.g. "Bob is a frog. Bob's skin color is ___". Even small GPT-2 models can relate "Bob" to concept "frog" and query "frog" "skin color" attribute. Even basic language modeling requires inference, and GPT-x can inference using transformer blocks.
With more layers it can go from inferencing meaning of words to doing inference to solve problems.
But the inference it is able to do is limited in scope because of the structure of a language model -- each input token must correspond to one output token. So the model can't take a pause and think about something, it can only think while it produces tokens.
Here's an absolutely insane example of embedding symbolic computation into a story which lets GPT-3 to break computation into small steps it can handle. Intermediate results become part of the story: https://twitter.com/kleptid/status/1284069270603866113
So I guess one can make a model which is much better at thinking simply by training in a different way or changing the topology. But the building blocks are good enough.
The representation GPT-3 learns are flexible. It can learn very complex tasks, including logic and simple arithmetic. The issue is GPT-3 learning algorithm and it's learning capability.
Typically the ability to learn << ability to represent in most cases. Universal approximation theorem type results don't say anything about learnability. GPT-3's abilities quickly fade once it gets outside the distribution it was trained with.
Yeah that’s what I was thinking when he started with the nonsensical questions, like “how many eyes does a foot have?” No (non-blind) human would need language input to learn this fact. Makes me wonder if anyone is working on large-scale architectures that can ingest multiple types of data and correlate them to make predictions.
So I don't think that you can conclude that humans learn more efficiency based on just quantity of data.
It's also worth noting that GPT-3 is trained to emulate _any_ human writing, not just some human's writing.
For an actual Turing test one might fine-tune it on text produced by one particular human, then you might get more accurate results.