Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
|
from
login
METR review of OpenAI's GPT-OSS fine-tuning safety methodology
(
metr.org
)
1 point
by
mustaphah
7 days ago
|
past
|
discuss
Measuring AI Ability to Complete Long Tasks
(
metr.org
)
2 points
by
Gedxx
36 days ago
|
past
Measuring AI Ability to Complete Long Tasks (2x every 7 months)
(
metr.org
)
3 points
by
tmoertel
40 days ago
|
past
The Impact of Early-2025 AI on Open-Source Developer Productivity
(
metr.org
)
3 points
by
jvdvegt
50 days ago
|
past
|
1 comment
Measuring AI Ability to Complete Long Tasks – METR
(
metr.org
)
2 points
by
diginova
84 days ago
|
past
Measuring the Impact of AI on Experienced OSS Developer Productivity [pdf]
(
metr.org
)
3 points
by
nreece
3 months ago
|
past
|
1 comment
Measuring Impact of 2025 AI on Experienced Open-Source Developer Productivity [pdf]
(
metr.org
)
1 point
by
sonabinu
3 months ago
|
past
Measuring the Impact of Early-2025 AI on Experienced OpenSource Dev Productivity [pdf]
(
metr.org
)
2 points
by
davikr
3 months ago
|
past
Measuring the Impact of AI on Experienced Open-Source Developer Productivity [pdf]
(
metr.org
)
18 points
by
ColinEberhardt
3 months ago
|
past
|
2 comments
Measuring the impact of AI on experienced open-source developer productivity
(
metr.org
)
775 points
by
dheerajvs
3 months ago
|
past
|
486 comments
Recent Frontier Models Are Reward Hacking
(
metr.org
)
2 points
by
surprisetalk
4 months ago
|
past
AI's Version of Moore's Law
(
metr.org
)
2 points
by
aazo11
6 months ago
|
past
|
1 comment
Measuring AI Ability to Complete Long Tasks
(
metr.org
)
2 points
by
pabo
7 months ago
|
past
Measuring AI Ability to Complete Long Tasks – METR
(
metr.org
)
7 points
by
gk1
7 months ago
|
past
|
1 comment
Measuring AI Ability to Complete Long Tasks
(
metr.org
)
4 points
by
stared
7 months ago
|
past
Measuring Automated Kernel Engineering
(
metr.org
)
1 point
by
gsky
8 months ago
|
past
Evaluating frontier AI R&D capabilities of LLM agents against human experts
(
metr.org
)
1 point
by
tedsanders
11 months ago
|
past
When LLM agents can do a task, they can often do so at a fraction of human cost
(
metr.org
)
4 points
by
cpainter
on Aug 6, 2024
|
past
METR: Model Evaluation and Threat Research
(
metr.org
)
2 points
by
Olshansky
on July 8, 2024
|
past
Bounty: Diverse hard tasks for LLM agents
(
metr.org
)
3 points
by
RoboTeddy
on Jan 20, 2024
|
past
Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: