Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
|
from
login
Show HN: Randomly switching between LMs at every step boosts SWE-bench score
(
swebench.com
)
5 points
by
lieret
28 days ago
|
past
|
1 comment
Show HN: New SWE-bench leaderboard compares LMs without fancy agent scaffolds
(
swebench.com
)
2 points
by
lieret
48 days ago
|
past
New leader on swe-bench multimodal
(
swebench.com
)
3 points
by
katrin777
84 days ago
|
past
SWE-bench just published an updated list of top AI Agents
(
swebench.com
)
4 points
by
laxyz
84 days ago
|
past
SWE-bench
(
swebench.com
)
1 point
by
katrin777
3 months ago
|
past
Refact.ai is the new open-source SOTA on SWE-bench Verified and Lite
(
swebench.com
)
3 points
by
bystrakowa
3 months ago
|
past
New #1 SOTA on Swe-bench is using Claude 3.7 and O1
(
swebench.com
)
3 points
by
knes
5 months ago
|
past
Gru.ai Got 35.67% on SWEbench
(
swebench.com
)
2 points
by
BabelCLoud
on Aug 15, 2024
|
past
Amazon Q Developer Agent is now SOTA on SWE-bench
(
swebench.com
)
4 points
by
brendanfalk
on May 14, 2024
|
past
SWE-Bench: Can Language Models Resolve Real-World GitHub Issues?
(
swebench.com
)
1 point
by
goranmoomin
on March 13, 2024
|
past
Can Language Models Resolve Real-World GitHub Issues?
(
swebench.com
)
1 point
by
throw2321
on Nov 8, 2023
|
past
SWE-Bench: Can Language Models Resolve Real-World GitHub Issues?
(
swebench.com
)
2 points
by
cjsaltlake
on Oct 13, 2023
|
past
SWE-Bench Can Language Models Resolve Real-World GitHub Issues?
(
swebench.com
)
3 points
by
EvgeniyZh
on Oct 10, 2023
|
past
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: