We should remember 3.5 was trained in an era when ChatGPT would routinely refuse to code at all and architected in an era when system prompts were not necessarily very effective. I bet this will improve, especially now that Claude has its own coding and arch cli tool.