> All I'm saying is that since there are mistakes, we know for a fact there isn't an accurate world model at play.
But I'm saying something different: even if a system has a world model that it believes is accurate, it still should be making intentional "mistakes" to test the world model in unusual circumstances, just like we continue to stress test thermodynamics, electromagnetism, gravity, etc.
So the evidence that GPT still makes mistakes is not necessarily evidence that it doesn't have an accurate world model. It could be evidence that it has a very rational approach to handling evidence (you should never have 100% confidence!) and is continuously testing world models.
> even if a system has a world model that it believes is accurate, it still should be making intentional "mistakes" to test the world model in unusual circumstances
I have no idea how to address this.
StockFish is an accurate world model of chess. It doesn't make mistakes, it cannot make mistakes. That's what an accurate world model is.
A system that can intentionally make mistakes is likely not a world model of anything at all. If it is a world model of something, it is a world model of something other than Othello, which does not have rules for "stress test"ing.
The OP is claiming that they have created an accurate world model for Othello, so I think discussion of intentional mistakes and thermodynamics is outside the scope.
Under your definition, humans don’t have world models either. If you just gave someone the move sequences, and no other knowledge of Othello, they too would have no way to be 100% sure that other moves won’t show up. It’s not a very useful definition.
> The OP is claiming that they have created an accurate world model for Othello, so I think discussion of intentional mistakes and thermodynamics is outside the scope.
It's not out of scope because these all fall under the problem of induction, which is what I mentioned in my first post. There is no such thing as achieving "certainty" in such scenarios where you don't have direct access to the underlying model, there is only quantified uncertainty. This was all formalized under Solomonoff Induction.
So I'm making three points:
1. Your requirement for "knowledge" to be "100% certainty" in such scenarios is just the wrong way to look at it, because such certainty isn't possible in these scenarios, even in principle, even for humans that are capable of world models, eg. swap a human in for GPT and they'll also achieve a non-zero error rate, even if they try to build a model of the rules. There is no well-defined, quantifiable threshold at which "quantified uncertainty" can become "certainty" and thus what you define as "knowledge". Therefore "knowledge" cannot be equated with "certainty" in these domains. The kind of "knowledge" you want is only possible in cases when you have direct access to the model to begin with, like being told the rules of Othello.
2. Even if you do happen upon an accurate model, you'd never know it and so have to continuously retest it by trying to violate it, so you cannot infer the lack of a internal model from the existence of an error rate. The point I was making with this is that your argument that "a non-zero error rate entails a lack of an accurate world model" is invalid, not that GPT necessarily has an accurate world model in this case.
3. I also dispute the distinction you're trying to make between statistical "pattern matching" and "understanding". "Understanding" must have some mechanistic basis which will have something like this kind of pattern matching. I assume you agree that a formalization of Othello's rules in classical logic would qualify as a world model. Bayesian probability theory where all the probabilities are pinned to 0 or 1 reduces to classical logic. Therefore, an inferred statistical model that asymptotically approaches this classical logic model, which is all we can do in these black box scenarios, is arguably operating based on an inferred world model with some inevitable degree of uncertainty as to specifically which world it's inhabiting.
Sure, but my point is simply that evidence of mistakes does not necessarily entail a lack of an accurate world model. When underlying models can only be inferred, there can never be 100% certainty that any inferred model is truly accurate, even when it is. Humans are only certain of Othello's rules because we have direct access to those rules.
If you put a human being in GPT's place in this exact same scenario, they too would make comparable mistakes. Humans are clearly capable of world models, therefore those mistakes are not an indication that world models are not in use.
But I'm saying something different: even if a system has a world model that it believes is accurate, it still should be making intentional "mistakes" to test the world model in unusual circumstances, just like we continue to stress test thermodynamics, electromagnetism, gravity, etc.
So the evidence that GPT still makes mistakes is not necessarily evidence that it doesn't have an accurate world model. It could be evidence that it has a very rational approach to handling evidence (you should never have 100% confidence!) and is continuously testing world models.