If you use any of the conventional tests that exist of theory of mind (most famo...

zipy124 · 2025-02-18T15:41:40 1739893300

That same source you link says that your view of 100% is not accepted as a consesus:

"... GPT-4's ability to reason about the beliefs of other agents remains limited (59% accuracy on the ToMi benchmark),[15] and is not robust to "adversarial" changes to the Sally-Anne test that humans flexibly handle.[16][17] While some authors argue that the performance of GPT-4 on Sally-Anne-like tasks can be increased to 100% via improved prompting strategies,[18] this approach appears to improve accuracy to only 73% on the larger ToMi dataset."

CamperBob2 · 2025-02-19T01:17:07 1739927827

In basically every case, by the time a claim like that is stated in a paper like that, it's obsolete by the time it's published, and ancient history by the time you use it to try to win an argument.

zipy124 · 2025-02-19T09:37:14 1739957834

My point is merely if you are going to make an argument using a source, the source should support your argument. If you say "the accuracy of an llm on task 1 is 90% [1]" and when you go to [1] it says the accuracy of an llm on task 1 is 50%, but some sources say with better prompts you can get to 90%, but when extended to a larger data-set for task 1, performance drops to 70%" then just quoting the highest number is mis-leading.

sebzim4500 · 2025-02-18T18:29:45 1739903385

We are talking about frontier models not GPT-4

zipy124 · 2025-02-19T09:38:12 1739957892

Yes but I am using the same source the commenter used to backup their figure, merely saying look your source doesn't say what you claim it does.

If they wanted to talk about frontier models maybe they should have cited a link to talking about frontier models performance.

esafak · 2025-02-18T14:39:55 1739889595

Maybe having a theory of mind isn't the big deal we thought it was. People are so conditioned to expect such things only from biological lifeforms, where theory of mind comes packaged with many other abilities that robots currently lack, that we reflexively dismiss the robot.