Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
beering
4 months ago
|
parent
|
context
|
favorite
| on:
Jagged AGI: o3, Gemini 2.5, and everything after
A human will give different answers to the same question, so I’m not sure why it’s fair to set a higher bar for an LLM. Or rather, I’m not sure how you would design this test in a way where humans would pass and the best LLM would fail.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: