This model seems to be really good at this. It's decently smart for an LM this s...

This model seems to be really good at this. It's decently smart for an LM this size, but more importantly, it can reliably catch its own bullshit and course-correct. And it keeps hammering at the problem until it actually has a working solution even if it takes many tries. It's like a not particularly bright but very persistent intern. Which, honestly, is probably what we want these models to be.