Submissions from metr.org

		METR review of OpenAI's GPT-OSS fine-tuning safety methodology (metr.org)
		1 point by mustaphah 7 days ago \| past \| discuss
		Measuring AI Ability to Complete Long Tasks (metr.org)
		2 points by Gedxx 36 days ago \| past
		Measuring AI Ability to Complete Long Tasks (2x every 7 months) (metr.org)
		3 points by tmoertel 40 days ago \| past
		The Impact of Early-2025 AI on Open-Source Developer Productivity (metr.org)
		3 points by jvdvegt 50 days ago \| past \| 1 comment
		Measuring AI Ability to Complete Long Tasks – METR (metr.org)
		2 points by diginova 84 days ago \| past
		Measuring the Impact of AI on Experienced OSS Developer Productivity [pdf] (metr.org)
		3 points by nreece 3 months ago \| past \| 1 comment
		Measuring Impact of 2025 AI on Experienced Open-Source Developer Productivity [pdf] (metr.org)
		1 point by sonabinu 3 months ago \| past
		Measuring the Impact of Early-2025 AI on Experienced OpenSource Dev Productivity [pdf] (metr.org)
		2 points by davikr 3 months ago \| past
		Measuring the Impact of AI on Experienced Open-Source Developer Productivity [pdf] (metr.org)
		18 points by ColinEberhardt 3 months ago \| past \| 2 comments
		Measuring the impact of AI on experienced open-source developer productivity (metr.org)
		775 points by dheerajvs 3 months ago \| past \| 486 comments
		Recent Frontier Models Are Reward Hacking (metr.org)
		2 points by surprisetalk 4 months ago \| past
		AI's Version of Moore's Law (metr.org)
		2 points by aazo11 6 months ago \| past \| 1 comment
		Measuring AI Ability to Complete Long Tasks (metr.org)
		2 points by pabo 7 months ago \| past
		Measuring AI Ability to Complete Long Tasks – METR (metr.org)
		7 points by gk1 7 months ago \| past \| 1 comment
		Measuring AI Ability to Complete Long Tasks (metr.org)
		4 points by stared 7 months ago \| past
		Measuring Automated Kernel Engineering (metr.org)
		1 point by gsky 8 months ago \| past
		Evaluating frontier AI R&D capabilities of LLM agents against human experts (metr.org)
		1 point by tedsanders 11 months ago \| past
		When LLM agents can do a task, they can often do so at a fraction of human cost (metr.org)
		4 points by cpainter on Aug 6, 2024 \| past
		METR: Model Evaluation and Threat Research (metr.org)
		2 points by Olshansky on July 8, 2024 \| past
		Bounty: Diverse hard tasks for LLM agents (metr.org)
		3 points by RoboTeddy on Jan 20, 2024 \| past