This model seems to be really good at this. It's decently smart for an LM this size, but more importantly, it can reliably catch its own bullshit and course-correct. And it keeps hammering at the problem until it actually has a working solution even if it takes many tries. It's like a not particularly bright but very persistent intern. Which, honestly, is probably what we want these models to be.