There are a lot of smart and motivated people putting effort into voice interfaces. It's the past (we've been speaking for over a hundred thousand years) and it is the future. It will take some time, but I'm pretty confident that computers interacting with our auditory cortex will replace small slabs of glass that we look at and touch for many tasks.
I think that's highly unlikely. We've had radio for 100 years. The written word is still thriving and TV gets twice as much time from people as radio does. Audio's fine for some things, but it's so very limited.
And as an aside, we haven't really had voice-only interfaces for 100k years. Really, they've only existed since the telephone. What existed previous to that was humans, whose in-person interactions are almost always far more than voice. People have different estimates of the amount of information conveyed in a conversation through expression, gesture, posture, glance, and the like, but it's never a small amount.
You did say "replace small slabs of glass that we look at and touch", so I think "voice-only" was a reasonable interpretation.
As I said, I'm all for trying new stuff. We should look at the extent to which computers can usefully leverage that channel. But I don't think we should presume that it will be particularly useful.