You can't mmap a socket, and mmap is core to how /dev/nvidia-uvm works.

afr0ck · on Oct 11, 2024

Well, it's not impossible. It's just software after all. You can mmap a remote device file, but you need OS support to do the magical paging for you, probably some sort of page ownership tracking protocol like in HMM [1], but outside a coherence domain.

I was once working on CXL [2] and memory ownership tracking in the Linux kernel and wanted to play with Nvidia GPUs, but then I hit a wall when I realised that a lot of the functionalities were running on the GSP or the firmware blob with very little to no documentation, so I ended up generally not liking the system software stack of Nvidia and I gave up the project. UVM subsystem in the open kernel driver is a bit of an exception, but a lot of the control path is still handled and controlled from closed-source cuda libraries in userspace.

tldr; it's very hard to do systems hacking with Nvidia GPUs.

[1] https://www.kernel.org/doc/html/v5.0/vm/hmm.html [2] https://en.wikipedia.org/wiki/Compute_Express_Link

monocasa · on Oct 11, 2024

Yeah, the Nvidia stuff isn't really made to be hacked on.

I'd check out the AMD side since you can at least have a full open source GPU stack to play with, and they make a modicum of effort to document their gpus.

majke · on Oct 11, 2024

It's a first time I hear about /dev/nvidia-uvm. Is there any documentation on how nvidia API works? Especially, how strong is the multi-tenancy story. Can two users use one GPU and expect reasonable security?

Last time I checked the GPU did offer some kind of memory isolation, but that was only for their datacenter, not consumer cards.

monocasa · on Oct 11, 2024

There's not a lot of docs on how it works. It used to be entirely in the closed source driver, now it's mainly a thin bridge to the closed source firmware blob.

But yes, for more than a decade now even with consumer cards, separate user processes have separate hardware enforced contexts. This is as true for consumer cards as it is for datacenter cards. This is core to how something like webgl works without exposing everything else being rented on your desktop to public Internet. There have been bugs, but per process hardware isolation with a GPU local mmu has been tablestakes for a modern gpu for nearly twenty years.

What datacenter gpus expose in addition to that is multiple virtual gpus, sort of like sr-iov, where a single gpu can be exposed to multiple CPU kernels running in virtual machines.

XorNot · on Oct 11, 2024

Which seems weird to me: if we're going to have device files, it's super annoying that they actually don't really act like files.

Like we really should just have enough rDMA in the kernel to let that work.

monocasa · on Oct 11, 2024

At it's core, this device file is responsible for managing a GPU local address space, and sharing memory securely with that address space in order to have a place to write command buffers and data that the gpu can see. It doesn't really make sense without a heavy memory mapping component.

A plan 9 like model that's heavily just a standard file would massively cut into gpu performance.

gorkish · on Oct 11, 2024

I agree with you that making RDMA a more accessible commodity technology is very important for "the future of compute". Properly configuring something like RoCEv2 or Infiniband is expensive and difficult. These technologies need to be made more robust in order to be able to run on commodity networks.

gorkish · on Oct 11, 2024

Granted it requires additional support from your nics/switches, but it is probably straightforward to remote nvidia-uvm with an RDMA server

yencabulator · on Oct 15, 2024

What you can do is mmap a file that's in a FUSE filesystem and relay reads/writes over the network to a server that holds that mmap.