Hacker News new | past | comments | ask | show | jobs | submit login

Which AMD GPU gives you 50 tok/s on a 30b model? My 3090 does 30 tok/s with a 4 bit quant.



I don't mean at the same time.

For a simple question, with RX 6800, I am observing ~50 tok/s on 8B models Deepseek 16B gives ~40 tok/s. 32B doesn't fit in memory




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: