Yeah, we made both of those assumptions. But it's apparently less true of the latest generation of compute engines. We probably won't bother with CUDA, but Metal is certainly a viable platform to try. It's one of those things where nobody really knows how it'll do until you try it.
Also, we do know a few people who want to use SOUL to write audio code which does need high parallelism. It's not a super-common use-case for audio, but it does exist.
I hope you'll consider writing up your results in a blog post or something! As I just mentioned elsewhere in this thread, I have experimentally found latency with Metal compute to be too high to be feasible, but I would dearly love to be proven wrong, if there's a trick I'm missing.
Also, we do know a few people who want to use SOUL to write audio code which does need high parallelism. It's not a super-common use-case for audio, but it does exist.