This + some benchmarks are shitty thus rational model should be allowed to not a... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		machiaweliczny on Dec 6, 2023 \| parent \| context \| favorite \| on: Gemini AI This + some benchmarks are shitty thus rational model should be allowed to not answer them but ask claryfying questions.

belval on Dec 6, 2023 [–]

Yes, a lot of those have pretty egregious annotation mistakes. Once you get in high percentage it's often worth going through your dataset with your model prediction and compare. Obviously you can't do that on academic benchmarks (though some papers still do).

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact