I thought most generic computation workloads are ill-suited for GPUs. A normal w...

pjmlp · on Oct 7, 2022

Exposing GPU programming to anyone besides C, C++ and Fortran developers would already help, even if that would take a speed bump, as proven by the few attempts targeting PTX.

I wasn't talking about Web apps.

galangalalgol · on Oct 7, 2022

Pytorch isn't just for ML, it can do normal signal processing or physics too. The julia libraries for cuda and roc and oneapi also are general enough for those uses and approachable. Both can fall back to cpu without much modification to the rest of your code.

If you aren't doing signal processing, physics or something that would benefit from simd, then the gp is correct, a gpu won't do much for you.

PaulHoule · on Oct 7, 2022

That said people are always discovering algorithms that get better performance than you'd expect from new hardware.

For instance a frightening amount of CPU is spent in financial messaging systems on validating UTF-8, parsing XML and JSON, converting numbers written in decimal digits to binary and things like that. You'd think these are "embarrassingly serial" problems but with clever coding and advanced SIMD instructions such as AVX-512 they can be accelerated for throughput, latency, and economy.

The benefits of the GPU are great enough that you might do more "work" but get the job done faster because it can be done in parallel.

For instance the algorithms used by the old A.I. ("expert systems") parallelize better than you might think (though not as well as the Japanese hoped they would in the 1980s) despite being super-branchy. Currently fashionable neural networks (called "connectionist" back in the day) require only predicated branching (which side of the ReLU are you on?) but spend a lot of calculations on parts of the network which might not be meaningful for the current inference. It depends on the details, but you might be better doing many more operations if you can do them in parallel.

Given that GPUs are out there and that so many people are working on them I think the range of what you can do with them is going to increase, though I think few people will be writing application logic on them directly, but they will increasingly use libraries and frameworks. For instance, see

https://arxiv.org/pdf/1709.02520.pdf

pjmlp · on Oct 7, 2022

Pytorch belongs to "... libraries written by the GPU druids..." on my comment.

And still requires specific skills to use, and is constrained to Python, C++ and Java based languages.

GPUs need to be exposed like SIMD, something that the language runtime takes care of, even if not perfect, better than not using them at all.

galangalalgol · on Oct 7, 2022

IME simd very rarely gets used by the compiler or runtime unless you make some slight changes in your data structures or flow, that require specific knowledge of the simd hardware. Asking a compiler to target unknown GPU architecture seems more likely to slow execution than speed it up. Even when writing my own cuda kernels I sometimes realize that something I am doing won't work well for a particular card and it is actually making me slower than the cpu. I'm sure we'll get there, but cards will have to converge a bit.

adwn · on Oct 7, 2022

The point stands, the vast majority of workloads are unsuited for GPUs, either because they are full of divergent branches, or because the data transfer and synchronization overheads would cancel any performance gains.