Tried it, it gave me what I think is a great, pithy answer and better worded tha...

clnq · on March 17, 2023

We know how LLMs work and they auto-complete text. There is no ghost in the shell. It’s not much more philosophical than graph theory is or Markov chains are. Yes, in some way it is, but not in the modern and usual sense of the word.

Human free will is built on many components of human thinking — self-interest, executive capability, understanding contextual information from all senses, symbolic reasoning, pattern recognition and learning, opinions, processing experiences into refined memories, and probably many more.

LLMs can only produce a new theory by attempting to create plausible language. They only “substitute for” the speech part of the human brain, not the rest. If there was no one to read what the LLMs would output and interpret it in a brain, no conscious thought would ever result from it.

nl · on March 17, 2023

> We know how LLMs work and they auto-complete text.

This isn't true. See eg[1]

It seems to be a common misconception that the training objective "predict the next word" implies something about the capabilities of a LLM.

As [1[] shows, instead they perform very sophisticated calculations on their compressed knowledge representation to try to generate acceptable text. These capabilities can be accessed for other purposes too.

[1] "Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers" https://arxiv.org/abs/2212.10559

clnq · on March 17, 2023

Hmm, I am not sure I follow. Why does anything about linear feed-forward layers makes the capabilities of an LLM beyond auto-completing text?

The article is very interesting, thanks for sharing. But it seems to be about how auto-completing from context works through meta-gradients rather than that the capabilities of GPT are other than predicting words. Am I misunderstanding something?

In light of this, I can only say that I compared LLMs to Markov chains too eagerly. I should have been clearer that LLMs and Markov chain models have a similar overall function (inputs and outputs are related in a barely similar way), but not implementation.

nl · on March 17, 2023

It means their internal model is updated with new data at runtime, and that this internal representation is flexible enough to build a context sensitive relationship between concepts.

It's this representation that is then used to generate words.

Words are merely the output.

It's like saying our conversation is just text completion. It's done by text but the fact there are relationship between concepts I'm trying to convince you of makes it more.

I don't think that is substantially different to what a LLM is doing, outside perhaps the motivation.

clnq · on March 17, 2023

Oh, now I get it. I wasn’t focusing on the right part of that article. Thank you.

nl · on March 17, 2023

Appreciate the reply!

Jensson · on March 17, 2023

LLMs are Markov chains, they are just more sophisticated than older text Markov chains. Markov chain is a statistical concept, not an algorithm to generate text in a specific way.

Nevermark · on March 17, 2023

LLMs are nothing like Markov chains.

Deep learning models / neural networks are universal approximators. That is a technical term for models that are capable of learning to model ANY finite relationship.

The mathematical requirement is just to have two layers, with enough units in the first layer. And the term finite relationship means any mapping that involves a finite number of discontinuities. (Not a practical limitation, since there is no alternative way to model relationships that cannot be characterized without an infinite amount information anyway.)

So yes, they learn algorithms.

With more layers and recurrent connections, they are vastly more efficient as well.

Deep learning models don't just do associations, or correlations, or statistical things like conditional probability. They learn functional relationships between concepts. Without limit in terms of the complexity of those relationships, given enough data, parameters and computing time.

So no, nothing remotely as limited as Markov chains. Your pessimism about them has some merit. Just not relevant here.

entropicgravity · on March 17, 2023

Yes I got the same response from GPT-4. It's hard not to notice though that children learn to walk before they can learn to throw and walking is a much more complex motor skill than throwing (or catching).

_oghd · on March 17, 2023

i would guess locomotion is more physically pre-hardwired. other (quadruped) mammals can walk right after birth.

https://www.scientificamerican.com/article/human-babies-long...