I suspect that, like self driving, that last 10%, 1%, 0.1% will be both functionally essential and exponentially difficult.
Video calls work great (well once we've sorted out the eye contact issue - now there's a real problem that needs really solving[1]), even with all the ML in the world avatars will be just a pale reflection of the real thing.
[1] You need a screen that is also a composite camera array, so that software can track the eyes on the incoming video feed and place the camera for the outgoing feed at that (moving) location. Sort of like a phased array for light. Thus when you look at someone's eyes, they see you looking directly down the camera.
Video calls work great (well once we've sorted out the eye contact issue - now there's a real problem that needs really solving[1]), even with all the ML in the world avatars will be just a pale reflection of the real thing.
[1] You need a screen that is also a composite camera array, so that software can track the eyes on the incoming video feed and place the camera for the outgoing feed at that (moving) location. Sort of like a phased array for light. Thus when you look at someone's eyes, they see you looking directly down the camera.