A software developer's time is much more precious than wasting time on sub-optimal models.
Open Weights models has it's place (in training custom agents and custom services), but if you are knowledge worker, using a model even 5% less than SOTA is extremely dumb
100% disagree with this take, the flexibility in controlling the prompt leads to QwenCoder2.5-32b outperforming gpt-o1 and claude sonnet 3.5 for nearly everything that I use it for (true for Gemma-27b and llama3.3-70b, though in this context I'm almost always using the former).
A specialist model that's specifically prompted to do the correct thing will outperform a SOTA generic model with a one size fits all system prompt. This is why small autocomplete models can very obviously outperform larger models at that specific task.
I am speaking 100% from experience and ignoring all benchmarks in forming this view btw, so maybe it's just my specific situation.
Also, in general I don't find the difference between SOTA models and local models to be that significant in the real world even when used in the exact same way.
yes, the vscode extension is a one click install, so is ollama which is a separate project that provides local inference
you'll then have to download a model, which ollama makes very easy. choosing which one will depend on your hardware but the biggest QwenCoder2.5 you can fit is a very solid starting place.
it's not ready for your grandma, but it's easy enough that I'd trust a junior dev to be able to get it done
Open Weights models has it's place (in training custom agents and custom services), but if you are knowledge worker, using a model even 5% less than SOTA is extremely dumb