"Emergent World Representations" The weasel word here is "emergent". That means ...

"Emergent World Representations"

The weasel word here is "emergent". That means they are implicit representations.

The representations of the Othello board that exist in that model are not explicitly constructed. They just happen to align with the model that a person playing Othello would likely represent the game with.

That work showed that, given an example sequence of valid Othello game states (as training corpus) and a valid "fresh" Othello game state (as a prompt), the system can hallucinate a sequence of valid Othello game states.

The system does not know what Othello is, what a turn is, or what playing is. It only has a model of game states progressing chronologically.

When we look objectively at that model, we can see that it aligns closely to the game rules. Of course it does! It was trained on literally nothing else. A valid Othello game progression follows those rules, and that is what was provided.

But the alignment is imperfect: some prompts hallucinate invalid game progressions. The model is not a perfect match for the explicit rules.

In order for all prompts to result in valid progressions, the training corpus must have enough examples to disambiguate. It doesn't need every example: plenty of prompts will stumble into a valid progression.

The next thing to recognize: a "valid" progression isn't a "strategic" progression. These are being constructed from what is known not what is chosen. Given a constrained set of Othello strategies in the example corpus, the system will not diverge from those strategies. It won't even diverge from the example strategies when the rules of Othello demand it.

GPT doesn't play the game. It plays the plays.