Hacker News new | past | comments | ask | show | jobs | submit login

Pretty sure llama.cpp can already do that





I forgot to clarify dealing with the network bottleneck

Just my two cents from experience, any sufficiently advanced LLM training or inference pipeline eventually figures out that the real bottleneck is the network!



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: