Voice interfaces are fantastic in specific realms. Controlling complicated things when you don't have the bandwidth to control it with your hands is ideal. Also text messaging and dictation.
Outside specific realms, they are far less useful.
I would argue that all falls off the cliff in terms of usability when we consider the rest of the world in terms of multiple languages, accents, creoles, and dialects.
Every interface is going to have limits and constraints.
Visual interfaces are great for people who have excellent vision and precision control. A friend of mine had a weird stroke which affected his cognitive functions and he lost the ability to read. He could see fine, speak reasonably well, had cognitive abilities... he just couldn't read. He ended up using voice controls and speech for everything on his computer.
If we let the things which affect individuals limit what interfaces we use, we'd quickly end up with nothing.
Ideally, we have 2 or more ways of doing anything to try and deal with the limits of individuals and the software/ hardware we interface with. Having both audible controls and visual controls as they do on the Tesla is ideal.
Your argument is flawed. The density of appreciable variations in speech patterns in English speaking countries is far greater than the differences in abilities to navigate visual interfaces in the same population.
Outside specific realms, they are far less useful.