Well, it's not impossible. It's just software after all. You can mmap a remote device file, but you need OS support to do the magical paging for you, probably some sort of page ownership tracking protocol like in HMM [1], but outside a coherence domain.
I was once working on CXL [2] and memory ownership tracking in the Linux kernel and wanted to play with Nvidia GPUs, but then I hit a wall when I realised that a lot of the functionalities were running on the GSP or the firmware blob with very little to no documentation, so I ended up generally not liking the system software stack of Nvidia and I gave up the project. UVM subsystem in the open kernel driver is a bit of an exception, but a lot of the control path is still handled and controlled from closed-source cuda libraries in userspace.
tldr; it's very hard to do systems hacking with Nvidia GPUs.
Yeah, the Nvidia stuff isn't really made to be hacked on.
I'd check out the AMD side since you can at least have a full open source GPU stack to play with, and they make a modicum of effort to document their gpus.
It's a first time I hear about /dev/nvidia-uvm. Is there any documentation on how nvidia API works? Especially, how strong is the multi-tenancy story. Can two users use one GPU and expect reasonable security?
Last time I checked the GPU did offer some kind of memory isolation, but that was only for their datacenter, not consumer cards.
There's not a lot of docs on how it works. It used to be entirely in the closed source driver, now it's mainly a thin bridge to the closed source firmware blob.
But yes, for more than a decade now even with consumer cards, separate user processes have separate hardware enforced contexts. This is as true for consumer cards as it is for datacenter cards. This is core to how something like webgl works without exposing everything else being rented on your desktop to public Internet. There have been bugs, but per process hardware isolation with a GPU local mmu has been tablestakes for a modern gpu for nearly twenty years.
What datacenter gpus expose in addition to that is multiple virtual gpus, sort of like sr-iov, where a single gpu can be exposed to multiple CPU kernels running in virtual machines.
At it's core, this device file is responsible for managing a GPU local address space, and sharing memory securely with that address space in order to have a place to write command buffers and data that the gpu can see. It doesn't really make sense without a heavy memory mapping component.
A plan 9 like model that's heavily just a standard file would massively cut into gpu performance.
I agree with you that making RDMA a more accessible commodity technology is very important for "the future of compute". Properly configuring something like RoCEv2 or Infiniband is expensive and difficult. These technologies need to be made more robust in order to be able to run on commodity networks.