With a Qwen3-32B-A3B (Q8) I'm getting 10-20 t/sec on KoboldAI, e.g., llama cpp. ...

		samus 47 days ago \| parent \| context \| favorite \| on: Kimi Linear: An Expressive, Efficient Attention Ar... With a Qwen3-32B-A3B (Q8) I'm getting 10-20 t/sec on KoboldAI, e.g., llama cpp. Faster than I can read, so good enough for hobby use. I expect this model to be significantly faster, but llama.cpp-based software probably doesn't support it yet.