Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> That would be enough to support a single user. If you want to host a service that provides this to 10k users in parallel your cost per user scales linearly with the GPU costs you posted.

No. Magic of batching allows you to handle multiple user requests in parallel using the same weights with little VRAM overhead per user.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: