I'm not really timing it as I just use these models via open webui, nvim and a f...

lylejantzi3rd · 2025-08-11T14:01:10 1754920870

I've found threads online that suggest that running gpt-oss-20b on ollama is slow for some reason. I'm running the 20b model via LM Studio on a 2021 M1 and I'm consistently getting around 50-60 T/s.

idonotknowwhy · 2025-08-12T00:31:06 1754958666

Pro tip: disable the title generation feature or set it to another model on another system.

After every chat, open webui is sending everything to llamacpp again wrapped in a prompt to generate the summary, and this wipes out the KV cache, forcing you to reprocess the entire context.

This will get rid of the long prompt processing times id you're having long back and forth chats with it.