OpenAI created a benchmark for this: https://openai.com/index/paperbench/ | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		slewis 6 months ago \| parent \| context \| favorite \| on: Deep learning gets the glory, deep fact checking g... OpenAI created a benchmark for this: https://openai.com/index/paperbench/

suddenlybananas 6 months ago [–]

Still has data contamination though.

Szpadel 6 months ago | [–]

still LLM cannot beat it so it's good enough for start

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact