Are WebGPU "libraries" at a point where instead of being recommended Python + Py...

paulgb · on May 2, 2023

Yes, it's definitely lagging the CUDA ecosystem. I did a write-up a few weeks back on the various approaches people are taking to do inference in the browser (WASM, WebGL, WebGPU): https://digest.browsertech.com/archive/browsertech-digest-th...

Matt Rickard also had a short write-up recently: https://matt-rickard.com/implementing-llms-in-the-browser

My take is that we probably won't see much training done on WebGPU, because training is done upfront and it makes sense to standardize on a single GPU vendor and use an interface that can squeeze all the juice out of those GPUs (CUDA). But for inference and run-time computations, it could be very interesting to take a model trained with CUDA/PyTorch and export it (maybe with Apache TVM or tensorflow.js) into WebGPU that can run on end-user devices.

MuffinFlavored · on May 2, 2023

> But for inference and run-time computations, it could be very interesting to take a model trained with CUDA/PyTorch and export it (maybe with Apache TVM or tensorflow.js) into WebGPU that can run on end-user devices.

In its current state, can you train on PyTorch, export to ONNX, load ONNX in JavaScript/WASM, then use it for WebGPU inference?

I'm not trying to sound obsessed/married to ONNX, I just though it was "the standard". Curious to learn alternatives/what people are doing now but I fear even talking about what might being done is discussing "bleeding edge".

Edit: A quick Google shows yes https://onnxruntime.ai/docs/tutorials/web/

paulgb · on May 2, 2023

> In its current state, can you train on PyTorch, export to ONNX, load ONNX in JavaScript/WASM, then use it for WebGPU inference?

I believe so. Onnxruntime very recently merged a WebGPU backend: https://news.ycombinator.com/item?id=35694553

You can also go directly from PyTorch to WebGPU with Apache TVM. (ONNX is also supported, but my understanding is that it's better to go direct). This is an example using an LLM trained with PyTorch (I think) and run in the browser: https://mlc.ai/web-llm/

MuffinFlavored · on May 3, 2023

I can't seem to figure if the PR for the WebGPU backend for onnxruntime is supposed to land in a 1.14 release, a 1.15 release, has already landed, isn't yet scheduled to land, etc? https://github.com/microsoft/onnxruntime/pull/14579

https://github.com/microsoft/onnxruntime/releases I don't see it in any releases yet?

https://github.com/microsoft/onnxruntime/milestone/4 I don't see it in the upcoming milestone.

I don't see any examples or docs that go with it

https://github.com/microsoft/onnxruntime/wiki/Upcoming-Relea... This seems to be out of date

https://github.com/microsoft/onnxruntime/tree/rel-1.15.0 I do see the js/webgpu work merged into here so I guess it'll be released in 1.15.0

https://onnxruntime.ai/docs/reference/releases-servicing.htm...

> Official releases of ONNX Runtime are managed by the core ONNX Runtime team. A new release is published approximately every quarter, and the upcoming roadmap can be found here.

ONNX Runtime v1.14.0 was Feb 10th

bufo · on May 2, 2023

It can be converted, and can be fairly fast, see Tensorflow.js or https://jott.live/markdown/m1_webgpu_perf. However the any “web” standard will always be a subset of what the hardware can truly provide because it has to support ALL GPU vendors. For example leveraging Nvidia Tensor Cores or Apple’s neural accelerator is not possible. For ML training and inference this means that any webGPU implementation is at least ~3x slower (likely much more) vs an optimized CUDA implementation.

MuffinFlavored · on May 2, 2023

> Web Shading Language (WSL)

The WSL in the article was throwing me off.

vhcr · on May 2, 2023

That's not necessarily true, WebGL for example supports extensions.

bufo · on May 3, 2023

Correct, but I don’t see Firefox or Chrome bothering to add tensor core support.