I actually agree with that, but it's generally better than other scores. Also, t...

		int_19h 5 months ago \| parent \| context \| favorite \| on: GPT-4.1 in the API I actually agree with that, but it's generally better than other scores. Also, the quote is like a year old at this point. In practice you have to evaluate the models yourself for any non-trivial task.