Capacity doesn't equate to quality, but I could easily see an 8B finetune with exl2 at low context working for short, simple customer interactions, akin to oversubscribing a 1Gbit uplink for 100 customers at 50Mbps.
This is wildly misleading as the benchmarks make use of batching. It will entirely fall apart in real workloads where each prompt is different. If you're doing batch processing with a fixed prompt, the results will be more applicable.
It depends. For batching to be viable each prompt has to share some similarities of context/intention, which quite often is the case in specific applications (as opposed to say general chats)
Saying that like it's mediocre... Maybe I'll have to benchmark my old 1050, see what it can do!