"I've finetuned llama/mistral models that greatly outperform GPT4 with just a prompt"
If you write about your experiments with that in detail I guarantee you'll get a lot of interest. The community is crying out for good, well documented, replicable examples of this kind of thing.
I'm so behind in this area. I had finetuned a model that was SOTA and worth publishing about in October, but procrastinated. I'm scared to check if somebody else already published on this topic.
Do you always assume other people are incompetent? That's not very nice of you.
I mostly work on AI, so I know if I'm overfitting or not. It performs provably better in it's domain (a niche programming language). GPT4 can barely write a hello world for it.
I'm not creating a "better GPT4" general chatbot. I'm finetuning for a specific task.
You have to know when to RAG, finetune, or RAG+finetune.