Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
Show HN: Randomly switching between LMs at every step boosts SWE-bench score (swebench.com)
5 points by lieret 28 days ago | past | 1 comment
Show HN: New SWE-bench leaderboard compares LMs without fancy agent scaffolds (swebench.com)
2 points by lieret 48 days ago | past
New leader on swe-bench multimodal (swebench.com)
3 points by katrin777 84 days ago | past
SWE-bench just published an updated list of top AI Agents (swebench.com)
4 points by laxyz 84 days ago | past
SWE-bench (swebench.com)
1 point by katrin777 3 months ago | past
Refact.ai is the new open-source SOTA on SWE-bench Verified and Lite (swebench.com)
3 points by bystrakowa 3 months ago | past
New #1 SOTA on Swe-bench is using Claude 3.7 and O1 (swebench.com)
3 points by knes 5 months ago | past
Gru.ai Got 35.67% on SWEbench (swebench.com)
2 points by BabelCLoud on Aug 15, 2024 | past
Amazon Q Developer Agent is now SOTA on SWE-bench (swebench.com)
4 points by brendanfalk on May 14, 2024 | past
SWE-Bench: Can Language Models Resolve Real-World GitHub Issues? (swebench.com)
1 point by goranmoomin on March 13, 2024 | past
Can Language Models Resolve Real-World GitHub Issues? (swebench.com)
1 point by throw2321 on Nov 8, 2023 | past
SWE-Bench: Can Language Models Resolve Real-World GitHub Issues? (swebench.com)
2 points by cjsaltlake on Oct 13, 2023 | past
SWE-Bench Can Language Models Resolve Real-World GitHub Issues? (swebench.com)
3 points by EvgeniyZh on Oct 10, 2023 | past

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: