That's not as easy as it sounds. In a desktop GUI, customers expect seamless integration between such sandboxes. Making the clipboard and drag and drop work two-way between such VMs can be a lot of work. Even for simple text, that may involve encoding conversions. That, in turn, means that the host must be able to infer what encoding the guest expects. Styled text and graphics are way harder.
The VMs also must react to changes in the parent OS such as a change in the keyboard layout, two apps running in different VMs may both want to have some control over hardware, etc.
The approach works well in Windows because Microsoft spends lots of time on it and because the compatibility changes aren't that great.
Making the clipboard and drag-and-drop work between any two X11 programs running on the same server is a hell of a lot of work, and many times simply impossible. So why are your goals for sandboxing so impossibly high?
Because I want my system to work, unlike that example you give. And impossibly? It worked fine in Apple's 68k-PPC and PPC-x86 transitions, across the various shims that Microsoft has for all kinds of old software, and may even work fine across various VM hosts running on Mac OS X (disclaimer: I have little experience with those)
The VMs also must react to changes in the parent OS such as a change in the keyboard layout, two apps running in different VMs may both want to have some control over hardware, etc.
The approach works well in Windows because Microsoft spends lots of time on it and because the compatibility changes aren't that great.
Of course, it also works well in systems where there is no need for the VMs to interact (other than via the network). VM (http://en.wikipedia.org/wiki/VM_(operating_system)) is a nice example there.