Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No, no, nothing like that.

Every layer of an LLM runs separately and sequentially, and there isn't much data transfer between layers. If you wanted to, you could put each layer on a separate GPU with no real penalty. A single request will only run on one GPU at a time, so it won't go faster than a single GPU with a big RAM upgrade, but it won't go slower either.



Interesting, thank you for the feedback, it's definitely worth looking into!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: