A human will give different answers to the same question, so I’m not sure why it...

		beering 4 months ago \| parent \| context \| favorite \| on: Jagged AGI: o3, Gemini 2.5, and everything after A human will give different answers to the same question, so I’m not sure why it’s fair to set a higher bar for an LLM. Or rather, I’m not sure how you would design this test in a way where humans would pass and the best LLM would fail.