Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you use any of the conventional tests that exist of theory of mind (most famously the Sally-Anne Test [1] but also the others) then SOTA reasoning models will get near 100%. Even if you try to come up with similar questions which you expect not to be in the training set they will still get them right.

In the absence of any evidence to the contrary, this is convincing evidence in my opinion.

[1] https://en.wikipedia.org/wiki/Sally%E2%80%93Anne_test




That same source you link says that your view of 100% is not accepted as a consesus:

"... GPT-4's ability to reason about the beliefs of other agents remains limited (59% accuracy on the ToMi benchmark),[15] and is not robust to "adversarial" changes to the Sally-Anne test that humans flexibly handle.[16][17] While some authors argue that the performance of GPT-4 on Sally-Anne-like tasks can be increased to 100% via improved prompting strategies,[18] this approach appears to improve accuracy to only 73% on the larger ToMi dataset."


In basically every case, by the time a claim like that is stated in a paper like that, it's obsolete by the time it's published, and ancient history by the time you use it to try to win an argument.


My point is merely if you are going to make an argument using a source, the source should support your argument. If you say "the accuracy of an llm on task 1 is 90% [1]" and when you go to [1] it says the accuracy of an llm on task 1 is 50%, but some sources say with better prompts you can get to 90%, but when extended to a larger data-set for task 1, performance drops to 70%" then just quoting the highest number is mis-leading.


We are talking about frontier models not GPT-4


Yes but I am using the same source the commenter used to backup their figure, merely saying look your source doesn't say what you claim it does.

If they wanted to talk about frontier models maybe they should have cited a link to talking about frontier models performance.


Maybe having a theory of mind isn't the big deal we thought it was. People are so conditioned to expect such things only from biological lifeforms, where theory of mind comes packaged with many other abilities that robots currently lack, that we reflexively dismiss the robot.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: