Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah performance at low buffer sizes is a big challenge, generally I recommend 512 or higher, which I know is not great but right now it's the most practical thing. The issue is that the computation is all done on the GPU, and there's a round-trip latency that has to be amortized. One day I'd like to convince Apple to work on the kernel scheduling latency...


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: