I think you misinterpret my point. The goal of your post is distinct from how people will interpret it. Plenty of times people intend one thing and get a different thing. That's life.
> In Simon's original post, people were claiming that o3 doesn't have those capabilities, and we were fooled by a chain of thought that was just rationalizing the EXIF data. It only had the _appearance_ of capability.
And this is the key part!
The people questioning O3's capabilities were concerned with cheating. Any mention of EXIF is a guess as to how it was cheating, but the suspicion is still that it is cheating. That's the critique!
If you framed the title as "O3 Does Not Need EXIF Data To Beat A Master-Level GeoGuessr" then I wouldn't have made my comment. The claim is much more specific and reflects the results of your post. You did in fact show that it doesn't need EXIF data to do what it does! BUT by framing it as "Beats a Master-Level" there is an implicit claim that both of you are playing the same game. The fact that you weren't is the issue.
Look at it this way. If I said I beat Tiger Woods at golf and then casually slipped in that I was playing with a handicap, wouldn't you feel a bit lied to? You'd think "Did Godelski really beat Tiger Woods?", and you would mean without the handicap. You'd have every right to be suspicious! And you'd have every right to dismiss me.
Most importantly, take a second here. My whole point is that you can make a much stronger claim! One where there wouldn't be a significant divergence between title and content. I get that it is frustrating to receive criticism, but even if you believe I'm wrong to do so, is it not more effective to show me up by just redoing without search? If you do that, then you only end up with a stronger claim. But by disagreeing and arguing here you're just not convincing me. Even if you disagree with my interpretation of the title, you know full well that it is a valid interpretation. Given the pushback from other comments I think you can't deny that it isn't an unexpected one. So the only way to resolve this is to either change the title or change the data. Besides, you responded to the top comment about how it was a fair criticism. All I've done is explain why the criticism was made in the first place!
And yes, it still undermines the result. Because that is entirely dependent on the (interpretation of the) claim that was made. Your results are still valid, but they only satisfy a weaker claim.
FWIW, I think the updated post is better. My comment here would only be that you could add clarity by showing the non-search scores (especially in the final table). In fact, the "study" being done with and without search makes a stronger post than had it only been one way. So kudos!
You've clearly thought this through, and I agree that had I been more precise at the start it would have avoided some confusion. I'm glad you like the updated post.
The people questioning O3's capabilities were concerned with cheating. Any mention of EXIF is a guess as to how it was cheating, but the suspicion is still that it is cheating. That's the critique!
If you framed the title as "O3 Does Not Need EXIF Data To Beat A Master-Level GeoGuessr" then I wouldn't have made my comment. The claim is much more specific and reflects the results of your post. You did in fact show that it doesn't need EXIF data to do what it does! BUT by framing it as "Beats a Master-Level" there is an implicit claim that both of you are playing the same game. The fact that you weren't is the issue.
Look at it this way. If I said I beat Tiger Woods at golf and then casually slipped in that I was playing with a handicap, wouldn't you feel a bit lied to? You'd think "Did Godelski really beat Tiger Woods?", and you would mean without the handicap. You'd have every right to be suspicious! And you'd have every right to dismiss me.
Most importantly, take a second here. My whole point is that you can make a much stronger claim! One where there wouldn't be a significant divergence between title and content. I get that it is frustrating to receive criticism, but even if you believe I'm wrong to do so, is it not more effective to show me up by just redoing without search? If you do that, then you only end up with a stronger claim. But by disagreeing and arguing here you're just not convincing me. Even if you disagree with my interpretation of the title, you know full well that it is a valid interpretation. Given the pushback from other comments I think you can't deny that it isn't an unexpected one. So the only way to resolve this is to either change the title or change the data. Besides, you responded to the top comment about how it was a fair criticism. All I've done is explain why the criticism was made in the first place!
And yes, it still undermines the result. Because that is entirely dependent on the (interpretation of the) claim that was made. Your results are still valid, but they only satisfy a weaker claim.
FWIW, I think the updated post is better. My comment here would only be that you could add clarity by showing the non-search scores (especially in the final table). In fact, the "study" being done with and without search makes a stronger post than had it only been one way. So kudos!