Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This feels like more of a system prompt issue than a model issue?


imo, model regression might actually stem from more aggressive use of toolformer-style prompting, or even RLHF tuning optimizing for obedience over initiative. i bet if you ran comparable tasks across 3-5, 3-7, and 4-0 with consistent prompts and tool access disabled, the underlying model capabilities might be closer than it seems.


Anecdotal of course, but I feel a very distinct difference between 3.5 and 3.7 when swapping between them in Cursor’s Agent mode (so the system prompt stays consistent).




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: