As slow as growth has become in the graphics card and general computing industry, getting to the point where we can do 120hz at 8k per eye is a lot further than 10 years off. It will require a significant change in how (where?) we're generating graphics, not just better hardware.
You're absolutely right... But by rendering separately the high resolution and low resolution views (each at 1k, but one 4x upscaled) we could do 60Hz today. It only requires 2 FHD renders (which is less than a 4k 2eye view at UHD) so 60Hz is very reasonable... possibly 90-120Hz. It also cuts the bandwidth by 8x (2/16) without compression.
It won't, no worries. Eye tracking in an HMD, whether AR or VR is hugely complicated from engineering point of view. There is a reason why an HMD eye tracking kit costs several thousands of USD.
Moreover, from the rendering point of view, foveated rendering is a fairly complex thing to integrate into a 3D engine too. It is definitelly not "free".
No panacea but another important piece of the puzzle. The software part may be complex - but it is just software, and once done, we all benefit from 10x battery life. Eye tracking hardware is complex but Lots of ongoing R&D - the outcome of which will be sensor chips which can be added to HMDs.
Assuming that it's feasible to do ~1080p per eye at 120hz today, 8k per eye is only 16 times more pixels. And considering that increasing the pixel size is embarrassingly parallel, I don't see that as a problem to be able to do in 10 years.
8k x 8k per eye at 120Hz is 64x more pixels at 1/3rd increase in frequency ~= 85x more processing power. Making the (maybe faulty) assumption of doubling processing power every 2 years and that current setup is processor limited, this sort of processing power is ~13 years away.
Same computation but with 4k x 4k per eye predicts ~9 years of progress needed.
I guess I misspoke about being 1920x1080 which is a "2K" screen split in two. An 8K screen split in two would be ~4000x4000 per eye which is still 16x as many pixels as I said, plus the 33% increase in frame rate which I didn't include which matches your second one. Although with how embarrasingly parallel it is, I don't think it's as far off as it seems. Especially considering that it's the previous generation graphics cards that can handle current day VR fine so we're 1-2 years into the 9 years we have to wait, and with so many pixels anti-aliasing can probably be turned off completely. You could probably build something today that could do it, it just would be very expensive and I don't think 8K panels at cell-phone size exist yet.
Also, eye-tracking + foveated rendering will severely reduce the load. Once that works reliably, you just need the cheap, super-high PPI, low-latency screens (which might almost exist today, though at high cost due to lack of a mass-market).