At this point the benchmarks barely matter at all. It's entirely possible to train for a high benchmark score and reduce the overall quality of the model in the process.
Imo use the model that makes the most sense when you ask it stuff, and personally I'd go for the one with the least censorship (which imo isn't AliBaba Qwen anything)
Imo use the model that makes the most sense when you ask it stuff, and personally I'd go for the one with the least censorship (which imo isn't AliBaba Qwen anything)