I'm not saying there's been no improvement at all. I personally wouldn't categorize it as staggering, but we can agree to disagree on that.
I find the improvements to be uneven in the sense that every time I try a new model I can find use cases where its an improvement over previous versions but I can also find use cases where it feels like a serious regression.
Our differences in how we categorize the amount of improvement over the past 2 years may be related to how much the newer models are improving vs regressing for our individual use cases.
When used as coding helpers/time accelerators, I find newer models to be better at one-shot tasks where you let the LLM loose to write or rewrite entire large systems and I find them worse at creating or maintaining small modules to fit into an existing larger system. My own use of LLMs is largely in the latter category.
To be fair I find the current peak model for coding assistant to be Claude 3.5 Sonnet which is much newer than 2 years old, but I feel like the improvements to get to that model were pretty incremental relative to the vast amount of resources poured into it and then I feel like Claude 3.7 was a pretty big back-slide for my own use case which has recently heightened my own skepticism.
Hilarious. Over two years we went from LLMs being slow and not very capable of solving problems to models that are incredibly fast, cheap and able to solve problems in different domains.
I find the improvements to be uneven in the sense that every time I try a new model I can find use cases where its an improvement over previous versions but I can also find use cases where it feels like a serious regression.
Our differences in how we categorize the amount of improvement over the past 2 years may be related to how much the newer models are improving vs regressing for our individual use cases.
When used as coding helpers/time accelerators, I find newer models to be better at one-shot tasks where you let the LLM loose to write or rewrite entire large systems and I find them worse at creating or maintaining small modules to fit into an existing larger system. My own use of LLMs is largely in the latter category.
To be fair I find the current peak model for coding assistant to be Claude 3.5 Sonnet which is much newer than 2 years old, but I feel like the improvements to get to that model were pretty incremental relative to the vast amount of resources poured into it and then I feel like Claude 3.7 was a pretty big back-slide for my own use case which has recently heightened my own skepticism.