Right: we effectively all need our own evals for the tasks that matter to us... ...

		simonw 4 months ago \| parent \| context \| favorite \| on: Jagged AGI: o3, Gemini 2.5, and everything after Right: we effectively all need our own evals for the tasks that matter to us... but writing those evals continues to be one of the least well documented areas of how to effectively use LLMs.