Are WebGPU "libraries" at a point where instead of being recommended Python + PyTorch for neural network training (then exporting the model to ONNX and turning around and loading it and using it)? If not, how far? I've had trouble understanding why the PyTorch "GPU" (CUDA, MPS, etc.) code can't convert to WASM + WebGPU. Something has to be missing/it has to be a huge effort, but in theory you're supposed to be able to do GPU calculation with WebGPU, right?
Seems like the ecosystem is very early/non-existent.
My take is that we probably won't see much training done on WebGPU, because training is done upfront and it makes sense to standardize on a single GPU vendor and use an interface that can squeeze all the juice out of those GPUs (CUDA). But for inference and run-time computations, it could be very interesting to take a model trained with CUDA/PyTorch and export it (maybe with Apache TVM or tensorflow.js) into WebGPU that can run on end-user devices.
> But for inference and run-time computations, it could be very interesting to take a model trained with CUDA/PyTorch and export it (maybe with Apache TVM or tensorflow.js) into WebGPU that can run on end-user devices.
In its current state, can you train on PyTorch, export to ONNX, load ONNX in JavaScript/WASM, then use it for WebGPU inference?
I'm not trying to sound obsessed/married to ONNX, I just though it was "the standard". Curious to learn alternatives/what people are doing now but I fear even talking about what might being done is discussing "bleeding edge".
You can also go directly from PyTorch to WebGPU with Apache TVM. (ONNX is also supported, but my understanding is that it's better to go direct). This is an example using an LLM trained with PyTorch (I think) and run in the browser: https://mlc.ai/web-llm/
I can't seem to figure if the PR for the WebGPU backend for onnxruntime is supposed to land in a 1.14 release, a 1.15 release, has already landed, isn't yet scheduled to land, etc? https://github.com/microsoft/onnxruntime/pull/14579
> Official releases of ONNX Runtime are managed by the core ONNX Runtime team. A new release is published approximately every quarter, and the upcoming roadmap can be found here.
It can be converted, and can be fairly fast, see Tensorflow.js or https://jott.live/markdown/m1_webgpu_perf. However the any “web” standard will always be a subset of what the hardware can truly provide because it has to support ALL GPU vendors. For example leveraging Nvidia Tensor Cores or Apple’s neural accelerator is not possible. For ML training and inference this means that any webGPU implementation is at least ~3x slower (likely much more) vs an optimized CUDA implementation.
Seems like the ecosystem is very early/non-existent.