Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I can't imagine this setup will get more than 1 token per second.

I would love to see Deepseek running on premise with a decent TPS.



It says 4.25 TPS in the first para.


Honest mistake. Some people think HN is just a series of short tweets and haven’t realized they are links yet!


It's the modern way. Why read when you can just imagine facts straight out of your own brain.


I agree but also found your comment funny in the context of LLMs. People love getting facts straight out of their models.


4.25 is enough tps for a lot of use cases.


That's still pretty slow, considering there's that "thinking" phase.


True, but 4.25 is the number we all want to know.


You can get 1t/s on a raspberry pi.

https://youtu.be/o1sN1lB76EA?si=i8ecEBjLdV0zewFQ


this has nothing to do with the full 671B and the ollama models are distilled qwen2.5


I appreciate both of these comments, thank you both.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: