Tried it, it gave me what I think is a great, pithy answer and better worded than I could:
> Young children typically learn to throw before they learn to catch because throwing is a simpler motor skill that requires less coordination and timing than catching. Throwing involves a relatively simple motion of extending the arm and releasing the object, whereas catching requires the child to track the object with their eyes, position their hands correctly, and time their movements to intercept the object.
> Furthermore, throwing is often a self-initiated action, whereas catching is typically in response to an external stimulus. This means that children can practice throwing at their own pace, while catching requires more reactive movements that can be harder to master.
You also say:
> The thing about GPT is that it cannot think on it's own, it can only deal logically with statements it has seen on the internet.
This is a philosophical assertion isn't it? Akin to arguing whether free will exists, or if we are the sum of subatomic interactions set in motion by the big bang. Humans came up with theories built on observations, leading to new theories. If ChatGPT learns old theories from the internet, why couldn't it come up with new ones in the way it came up with that answer above? If it knows x + x = 2 * x it could come up with y + y = 2 * y just from abstraction.
We know how LLMs work and they auto-complete text. There is no ghost in the shell. It’s not much more philosophical than graph theory is or Markov chains are. Yes, in some way it is, but not in the modern and usual sense of the word.
Human free will is built on many components of human thinking — self-interest, executive capability, understanding contextual information from all senses, symbolic reasoning, pattern recognition and learning, opinions, processing experiences into refined memories, and probably many more.
LLMs can only produce a new theory by attempting to create plausible language. They only “substitute for” the speech part of the human brain, not the rest. If there was no one to read what the LLMs would output and interpret it in a brain, no conscious thought would ever result from it.
> We know how LLMs work and they auto-complete text.
This isn't true. See eg[1]
It seems to be a common misconception that the training objective "predict the next word" implies something about the capabilities of a LLM.
As [1[] shows, instead they perform very sophisticated calculations on their compressed knowledge representation to try to generate acceptable text. These capabilities can be accessed for other purposes too.
[1] "Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers" https://arxiv.org/abs/2212.10559
Hmm, I am not sure I follow. Why does anything about linear feed-forward layers makes the capabilities of an LLM beyond auto-completing text?
The article is very interesting, thanks for sharing. But it seems to be about how auto-completing from context works through meta-gradients rather than that the capabilities of GPT are other than predicting words. Am I misunderstanding something?
In light of this, I can only say that I compared LLMs to Markov chains too eagerly. I should have been clearer that LLMs and Markov chain models have a similar overall function (inputs and outputs are related in a barely similar way), but not implementation.
It means their internal model is updated with new data at runtime, and that this internal representation is flexible enough to build a context sensitive relationship between concepts.
It's this representation that is then used to generate words.
Words are merely the output.
It's like saying our conversation is just text completion. It's done by text but the fact there are relationship between concepts I'm trying to convince you of makes it more.
I don't think that is substantially different to what a LLM is doing, outside perhaps the motivation.
LLMs are Markov chains, they are just more sophisticated than older text Markov chains. Markov chain is a statistical concept, not an algorithm to generate text in a specific way.
Deep learning models / neural networks are universal approximators. That is a technical term for models that are capable of learning to model ANY finite relationship.
The mathematical requirement is just to have two layers, with enough units in the first layer. And the term finite relationship means any mapping that involves a finite number of discontinuities. (Not a practical limitation, since there is no alternative way to model relationships that cannot be characterized without an infinite amount information anyway.)
So yes, they learn algorithms.
With more layers and recurrent connections, they are vastly more efficient as well.
Deep learning models don't just do associations, or correlations, or statistical things like conditional probability. They learn functional relationships between concepts. Without limit in terms of the complexity of those relationships, given enough data, parameters and computing time.
So no, nothing remotely as limited as Markov chains. Your pessimism about them has some merit. Just not relevant here.
Yes I got the same response from GPT-4. It's hard not to notice though that children learn to walk before they can learn to throw and walking is a much more complex motor skill than throwing (or catching).
> Young children typically learn to throw before they learn to catch because throwing is a simpler motor skill that requires less coordination and timing than catching. Throwing involves a relatively simple motion of extending the arm and releasing the object, whereas catching requires the child to track the object with their eyes, position their hands correctly, and time their movements to intercept the object.
> Furthermore, throwing is often a self-initiated action, whereas catching is typically in response to an external stimulus. This means that children can practice throwing at their own pace, while catching requires more reactive movements that can be harder to master.
You also say:
> The thing about GPT is that it cannot think on it's own, it can only deal logically with statements it has seen on the internet.
This is a philosophical assertion isn't it? Akin to arguing whether free will exists, or if we are the sum of subatomic interactions set in motion by the big bang. Humans came up with theories built on observations, leading to new theories. If ChatGPT learns old theories from the internet, why couldn't it come up with new ones in the way it came up with that answer above? If it knows x + x = 2 * x it could come up with y + y = 2 * y just from abstraction.