(watching live) I'm wondering how it performs on the METR benchmark (https://met...

		haffi112 8 months ago \| parent \| context \| favorite \| on: A Research Preview of Codex (watching live) I'm wondering how it performs on the METR benchmark (https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...).