o1 mini seems to get it on the first try (I didn't vet the code, but I tested it...

mewpmewp2 · 2024-10-05T00:09:11 1728086951

In addition to that after they create the 1st program with mistakes the author should have showed them the invalid output and let them have a chance to fix it. For humans solving this on the first try without running the code also tends to frequently not work.

fragmede · 2024-10-05T00:25:17 1728087917

"seems to" isn't good enough, especially since it's entirely possible to generate code that doesn't give the right answer. 4o is able to write some bad code, run it, recognize that it's bad, and then fix it, if you tell it to.

https://chatgpt.com/share/670086ed-67bc-8009-b96c-39e539791f...

Chinjut · 2024-10-05T12:34:18 1728131658

Did you actually run the "fixed" code here? Its output is an empty list, just like the pre-"fixed" code.

Chinjut · 2024-10-05T15:53:38 1728143618

Hm, actually, it's confusing, because clicking the [>_] links where it mentions running code gives different code than it just mentioned.

isaacfrond · 2024-10-05T08:30:41 1728117041

despite the name ‘mini’. it is actually more optimized for code. so that makes sense.