>Also, can you reference where they used a pre-trained GPT model? The trite answ...

>Also, can you reference where they used a pre-trained GPT model?

The trite answer is the "P" in GPT stands for "Pre-trained."

>I think they conclusively show the answer to that is yes, right?

Sure, but what's interesting about world models is their extrapolation abilities and without that, you're just saying "this magic backsolving machine backsolved into something we can understand, which is weird because usually that's not the case."

That quote in and of itself is cool, but not the takeaway a lot of people are getting from this.

>What does overfitting to the rules of othello have to do with it, I don’t follow?

Again, I'm just implying that under extreme circumstances, the parameters of LLMs do this thing where they look like rules-based algorithms if you use the right probing tools. We've seen it for very small Neural Nets trained on multiplication as well. That's not to say GPT-4 is a fiefdom of tons of rules-based algorithms that humans could understand (that would be bad in fact! We aren't that good noticers or pattern matchers).