I can't imagine this setup will get more than 1 token per second. I would love t...

thomquaid · 2025-02-01T11:51:48 1738410708

It says 4.25 TPS in the first para.

ricardobeat · 2025-02-01T11:54:44 1738410884

Honest mistake. Some people think HN is just a series of short tweets and haven’t realized they are links yet!

4ndrewl · 2025-02-01T12:56:57 1738414617

It's the modern way. Why read when you can just imagine facts straight out of your own brain.

plagiarist · 2025-02-01T14:53:11 1738421591

I agree but also found your comment funny in the context of LLMs. People love getting facts straight out of their models.

thomquaid · 2025-02-01T11:59:25 1738411165

4.25 is enough tps for a lot of use cases.

weatherlight · 2025-02-01T11:59:56 1738411196

That's still pretty slow, considering there's that "thinking" phase.

thomquaid · 2025-02-01T12:02:03 1738411323

True, but 4.25 is the number we all want to know.

october8140 · 2025-02-01T11:59:36 1738411176

You can get 1t/s on a raspberry pi.

klohto · 2025-02-01T12:06:28 1738411588

this has nothing to do with the full 671B and the ollama models are distilled qwen2.5

thomquaid · 2025-02-01T12:18:19 1738412299

I appreciate both of these comments, thank you both.