I wonder why Mistral et al don't prepare GGUF versions of these for launch day?
If I were them I'd want to be the default source of the versions of my models that people use, rather than farming that out to whichever third party races to publish the GGUF (and other formats) first.
Some of the major vendors _do_ create the GGUFs for their models, but often they have the wrong parameter settings, need changes in the inference code, or don't include the correct prompt template. We (i.e. Ollama) have our own conversion scripts and we try to work with the model vendors to get everything working ahead of time, but unfortunately Mistral doesn't usually give us a heads up before they release.
llama.cpp is still under development and they sometimes come out with breaking changes or new quantization methods, and it can be a lot of work to keep up with these changes as you publish more models over time. It's easier to just publish a standard float32 safetensors that works with PyTorch, and let the community deal with other runtimes and file formats.
If it's a new architecture, then there's also additional work needed to add support in llama.cpp, which means more dev time, more testing, and potentially loss of surprise model release if the development work has to be done out in the open
If I were them I'd want to be the default source of the versions of my models that people use, rather than farming that out to whichever third party races to publish the GGUF (and other formats) first.