Short answer: it has been a big pain in the butt. The GPU hardware is mostly rea...

Short answer: it has been a big pain in the butt. The GPU hardware is mostly really great, but the drivers/APIs were not designed for such a low-latency use case. There's (for audio) a large overhead latency in kernel execution scheduling. I've had to do a lot of fun optimization in terms of just reducing the runtime of the kernel itself, and a lot of less-fun evil dark magic optimization to e.g. trick macOS into raising the GPU clock speed.

Long answer: I've written a fair bit about this on my devlog. You might check out these tags:

https://anukari.com/blog/devlog/tags/gpu https://anukari.com/blog/devlog/tags/optimization