Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Current models are quite far away from human-level physical reasoning (paper below). An upcoming version of models trained on world simulation will probably do much better.

PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

https://phybench-official.github.io/phybench-demo/



This is more about a physics math aptitude test. You can already see that the best model in math is saturating it halfway. It might not indicate its usefulness in actual physical reasoning, or at the very least, it seems like a bit of a stretch.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: