Modern GPUs are extremely programmable, but this flexibility isn't that heavily used by neural networks. NN inference is pretty much just a huge amount of matrix multiplication.
Especially for mobile applications (most Arm customers), you pay extra energy for all that pipeline flexibility that isn't being used. A dedicated chip will save a bunch of power.
Especially for mobile applications (most Arm customers), you pay extra energy for all that pipeline flexibility that isn't being used. A dedicated chip will save a bunch of power.