Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The number of people running llama3 70b on NVidia gaming GPUs is absolutely tiny. You're going to need at least two of the highest end 24 GB VRAM GPUs and even then you are still reliant on 4 bit quantization with almost nothing left for your context window.


The cognitive dissonance here.

70B models arent better than 7B models outside roleplay. The logic all sucks the same. No one even cares about 70B models.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: