I'm not saying there's been no improvement at all. I personally wouldn't categor...

I'm not saying there's been no improvement at all. I personally wouldn't categorize it as staggering, but we can agree to disagree on that.

I find the improvements to be uneven in the sense that every time I try a new model I can find use cases where its an improvement over previous versions but I can also find use cases where it feels like a serious regression.

Our differences in how we categorize the amount of improvement over the past 2 years may be related to how much the newer models are improving vs regressing for our individual use cases.

When used as coding helpers/time accelerators, I find newer models to be better at one-shot tasks where you let the LLM loose to write or rewrite entire large systems and I find them worse at creating or maintaining small modules to fit into an existing larger system. My own use of LLMs is largely in the latter category.

To be fair I find the current peak model for coding assistant to be Claude 3.5 Sonnet which is much newer than 2 years old, but I feel like the improvements to get to that model were pretty incremental relative to the vast amount of resources poured into it and then I feel like Claude 3.7 was a pretty big back-slide for my own use case which has recently heightened my own skepticism.