I've looked a tiny bit into this sort of thing (using Metal compute shaders), an...

I've looked a tiny bit into this sort of thing (using Metal compute shaders), and the problem is that, even with all of Metal's optimizations around CPU/GPU synchronization, the overhead seems to be too high for real-time audio DSP in practice.

The thing I tried experimentally implementing is basically a 128x128 matrix mixer with a little bit of extra processing. On a three-year-old MacBook Pro, the GPU barely had to lift a finger to crunch the data, but the round-trip latency was still high enough that it would struggle to keep up with anything less than a buffer size of 512 or so at 48kHz (which is on the high end for live mic processing). It would be fantastic for offline processing with larger buffers, though.

I haven't tried CUDA or OpenCL, so I don't know if the situation is the same there—but of course they have the problem of vendor support as well.