Unified memory exists, but it's not a magic bullet. If a page is accessed that doesn't reside on device memory (i.e. on the GPU), a memcpy is issued to fetch the page from main RAM. While the programming model is nicer, it doesn't fundamentally change the fact that you need to constantly swap data out to main RAM and while not as bad as loading it from the SSD or HDD, that's still quite slow.
Integrated GPUs that use a portion of system memory are an exception to this and do not require memcpys when using unified memory. However, I'm not aware of any powerful iGPUs from Nvidia these days.
Sure. Makes sense. So I guess for discrete GPUs the unified memory stuff provides a universal address space but merely abstracts the copying/streaming of the data.
There does seem to be a zero copy concept as well and I've certainly used direct memory access over pcie before on other proprietary devices.