Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Even presuming this is an accurate summary, the conclusion is not accurate - most local LLM inference users are constantly trading off quality for speed, in that speed drops dramatically once RAM is full. So, if you think of speed at desired quality, this could be very useful.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: