We know how LLMs work and they auto-complete text. There is no ghost in the shel...

nl · on March 17, 2023

> We know how LLMs work and they auto-complete text.

This isn't true. See eg[1]

It seems to be a common misconception that the training objective "predict the next word" implies something about the capabilities of a LLM.

As [1[] shows, instead they perform very sophisticated calculations on their compressed knowledge representation to try to generate acceptable text. These capabilities can be accessed for other purposes too.

[1] "Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers" https://arxiv.org/abs/2212.10559

clnq · on March 17, 2023

Hmm, I am not sure I follow. Why does anything about linear feed-forward layers makes the capabilities of an LLM beyond auto-completing text?

The article is very interesting, thanks for sharing. But it seems to be about how auto-completing from context works through meta-gradients rather than that the capabilities of GPT are other than predicting words. Am I misunderstanding something?

In light of this, I can only say that I compared LLMs to Markov chains too eagerly. I should have been clearer that LLMs and Markov chain models have a similar overall function (inputs and outputs are related in a barely similar way), but not implementation.

nl · on March 17, 2023

It means their internal model is updated with new data at runtime, and that this internal representation is flexible enough to build a context sensitive relationship between concepts.

It's this representation that is then used to generate words.

Words are merely the output.

It's like saying our conversation is just text completion. It's done by text but the fact there are relationship between concepts I'm trying to convince you of makes it more.

I don't think that is substantially different to what a LLM is doing, outside perhaps the motivation.

clnq · on March 17, 2023

Oh, now I get it. I wasn’t focusing on the right part of that article. Thank you.

nl · on March 17, 2023

Appreciate the reply!

Jensson · on March 17, 2023

LLMs are Markov chains, they are just more sophisticated than older text Markov chains. Markov chain is a statistical concept, not an algorithm to generate text in a specific way.

Nevermark · on March 17, 2023

LLMs are nothing like Markov chains.

Deep learning models / neural networks are universal approximators. That is a technical term for models that are capable of learning to model ANY finite relationship.

The mathematical requirement is just to have two layers, with enough units in the first layer. And the term finite relationship means any mapping that involves a finite number of discontinuities. (Not a practical limitation, since there is no alternative way to model relationships that cannot be characterized without an infinite amount information anyway.)

So yes, they learn algorithms.

With more layers and recurrent connections, they are vastly more efficient as well.

Deep learning models don't just do associations, or correlations, or statistical things like conditional probability. They learn functional relationships between concepts. Without limit in terms of the complexity of those relationships, given enough data, parameters and computing time.

So no, nothing remotely as limited as Markov chains. Your pessimism about them has some merit. Just not relevant here.