Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A human will give different answers to the same question, so I’m not sure why it’s fair to set a higher bar for an LLM. Or rather, I’m not sure how you would design this test in a way where humans would pass and the best LLM would fail.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: