Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
METR review of OpenAI's GPT-OSS fine-tuning safety methodology (metr.org)
1 point by mustaphah 7 days ago | past | discuss
Measuring AI Ability to Complete Long Tasks (metr.org)
2 points by Gedxx 36 days ago | past
Measuring AI Ability to Complete Long Tasks (2x every 7 months) (metr.org)
3 points by tmoertel 40 days ago | past
The Impact of Early-2025 AI on Open-Source Developer Productivity (metr.org)
3 points by jvdvegt 50 days ago | past | 1 comment
Measuring AI Ability to Complete Long Tasks – METR (metr.org)
2 points by diginova 84 days ago | past
Measuring the Impact of AI on Experienced OSS Developer Productivity [pdf] (metr.org)
3 points by nreece 3 months ago | past | 1 comment
Measuring Impact of 2025 AI on Experienced Open-Source Developer Productivity [pdf] (metr.org)
1 point by sonabinu 3 months ago | past
Measuring the Impact of Early-2025 AI on Experienced OpenSource Dev Productivity [pdf] (metr.org)
2 points by davikr 3 months ago | past
Measuring the Impact of AI on Experienced Open-Source Developer Productivity [pdf] (metr.org)
18 points by ColinEberhardt 3 months ago | past | 2 comments
Measuring the impact of AI on experienced open-source developer productivity (metr.org)
775 points by dheerajvs 3 months ago | past | 486 comments
Recent Frontier Models Are Reward Hacking (metr.org)
2 points by surprisetalk 4 months ago | past
AI's Version of Moore's Law (metr.org)
2 points by aazo11 6 months ago | past | 1 comment
Measuring AI Ability to Complete Long Tasks (metr.org)
2 points by pabo 7 months ago | past
Measuring AI Ability to Complete Long Tasks – METR (metr.org)
7 points by gk1 7 months ago | past | 1 comment
Measuring AI Ability to Complete Long Tasks (metr.org)
4 points by stared 7 months ago | past
Measuring Automated Kernel Engineering (metr.org)
1 point by gsky 8 months ago | past
Evaluating frontier AI R&D capabilities of LLM agents against human experts (metr.org)
1 point by tedsanders 11 months ago | past
When LLM agents can do a task, they can often do so at a fraction of human cost (metr.org)
4 points by cpainter on Aug 6, 2024 | past
METR: Model Evaluation and Threat Research (metr.org)
2 points by Olshansky on July 8, 2024 | past
Bounty: Diverse hard tasks for LLM agents (metr.org)
3 points by RoboTeddy on Jan 20, 2024 | past

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: