A bit unfair to call local models Matchbox cars compared to F1. There are plenty of uses for LLMs locally that don't require the largest models, it's not like it has to be all-or-nothing. For example, as a general browser assistant to help summarize articles, explain context, etc. the gemma-3-4B model does very well and is lightning fast on my old 3060 Ti.
You just wrote exact confirmation. Running 3060 Ti gemma-3-4B to play with as your local assistant is toying around.
Make the same as a startup or a company and you most likely will be out of business in 3 to 6 months because big guys will have everything faster and better in no time. GPT-o3 price drop of 80% most likely made running 3060Ti more expensive if you check your energy bill.
Not looking to do that though! You can call it toying around if you want, but I think you're really limiting your perspective by dismissing smaller models.