Current models are quite far away from human-level physical reasoning (paper bel... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		nopinsight 6 months ago \| parent \| context \| favorite \| on: Qwen3: Think deeper, act faster Current models are quite far away from human-level physical reasoning (paper below). An upcoming version of models trained on world simulation will probably do much better. PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models https://phybench-official.github.io/phybench-demo/

horhay 6 months ago [–]

This is more about a physics math aptitude test. You can already see that the best model in math is saturating it halfway. It might not indicate its usefulness in actual physical reasoning, or at the very least, it seems like a bit of a stretch.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact