Something I noticed a long time ago is that going from 90% correct to 95% correc...

pishpash · 2025-05-22T08:54:48 1747904088

You are having low expectations here. People used to enter machine code on switches and punched paper tape, so yes they made sure it worked the first time. Later, people had code reviews by marking up printouts of code, and software got sent out in boxes that couldn't be changed until the next year.

Programmers who "iterate" buggy shit for 10 rounds until they get it right are a post-Google push-update phenomenon.

jiggawatts · 2025-05-22T09:08:20 1747904900

Been there, done that. I made mistakes and had to try again or correct the input (when that was an option).

hatefulmoron · 2025-05-22T08:52:31 1747903951

I don't think the length you're talking about is that much of an issue. As you say, depending on how you measure it, LLMs are better at remaining accurate over a long span of text.

The issue seems to be more in the intelligence department. You can't really leave them in an agent-like loop with compiler/shell output and expect them to meaningfully progress on their tasks past some small number of steps.

Improving their initial error-free token length is solving the wrong problem. I would take less initial accuracy than a human but equally capable of iterating on their solution over time.