> if it made no mistakes it would offer strong evidence I don't see how this has...

nickelpro · on Jan 22, 2023

This is not what they claim, the central claim is that OthelloGPT has developed a "causal model", "an understandable model of the process producing the sequences" (For example: "are they picking up the rules of English grammar?"). Ie, does OthelloGPT know the rules of Othello?

They make a clear distinction between world representation ("board state"/"internal activation"), and world model ("game engine"/"LLM Layers"), see their diagram [1].

That state exists it not interesting or new, a LLM necessarily transforms input sequences into a form of state in order to predict the next word in a sentence, and OthelloGPT necessarily transforms the move sequence into an internal state in order to predict the next move.

The only interesting question is what mechanism picks the next move, simply following the patterns (the crow, calling out the next move), or an understanding of the rules of the game (the crow, analyzing a fresh board).

Except that's a false dichotomy, the analogy doesn't work, the crow "analyzing a fresh board" is still capable of just doing pattern recognition.

[1]: https://lh6.googleusercontent.com/zvQObJtkqyFey9TD2Ibzx5K9s5...

fenomas · on Jan 22, 2023

The crow never analyzes any board - that's the point of the analogy. It only hears a sequence of tokens, and it learns to predict future tokens, and TFA sets out to answer whether it does that by "surface statistics" (i.e. which tokens are more likely at which times), or with a "world model" (by laying out a bunch of seeds and flipping some of them over after each token in order to simulate the state of the game).

Edit: from other replies, I think I see the issue here. In TFA, "world model" is referring to the insight that "D4, E5, B3" are more than just tokens in a grammar - that they represent graph nodes with state, and that when each token occurs it toggles the states of other graph nodes. TFA is asking whether the LLM has learned that insight, as opposed to merely detecting patterns in the token sequences without any model of what they represent.

You seem to be taking "world model" to mean something quite different - like "a rule-based understanding of game rules that isn't statistical" or such. TFA doesn't discuss or investigate anything along those lines, just whether the LLM is simulating the on/off state of Othello pieces.

nickelpro · on Jan 22, 2023

Ok, I've read the original paper.

I agree with you fully on the claims made in the paper, and agree those claims are broadly substantiated by the paper.

I completely disagree that those claims are what is represented in the opening paragraphs of this article they have written, which directly analogizes to learning the grammatical rules of English and syntax of C++, which describes the model as understanding "the process producing the sequences".

Demonstrating that the model processes the input sequence into a representative state, and that manipulating that state has a causal effect on the output, is a very different ball game than "our model understands the rules behind this sequence".

fenomas · on Jan 23, 2023

> is a very different ball game than "our model understands the rules behind this sequence".

The authors state plainly what they mean by "understands the rules". That the phrase could also mean other things is no reason to dispute their conclusions.