What was your prompt to get it to run the test suite and heal tests at every step? I didn’t see that mentioned in your write up. Also, any specific reason you went with Codex over Claude Code?
For me (original author of JustHTML), it was enough the put the instructions on how to run tests in the AGENTS.md. It knows enough about coding to run tests by itself.