Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Arguably I would think that the last year was mainly inner harness improvement instead model improvement but I could be wrong, just feels like that to me
 help



We can measure this by looking at the same harness applied to different models, e.g. the very plain Terminus: https://www.tbench.ai/leaderboard/terminal-bench/2.0?agents=...

Models have improved dramatically even with the same harness


I mean that just the way it tackles task in the core is generated differently, like inner harness, through system prompt or deeper root. F.e. Instead of answering instantly it goes through a pre-defined steps which strategy should be done, split task, use thinking tokens, use tools etc.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: