>We can translate spoken words to text strings pretty reliably. AFAIK, we can do this locally without much processing power, but I may be mistaken.
Cloud services backed by big data sets tend to be better although I admit I haven't tried a local copy of Dragon Naturally Speaking for a long time.
In any case, at least assuming fairly mainstream American/English accents, the voice recognition isn't really the problem any longer. Sophisticated NLP and responses are. We're a long way from virtual assistants that can do anything sophisticated.
Cloud services backed by big data sets tend to be better although I admit I haven't tried a local copy of Dragon Naturally Speaking for a long time.
In any case, at least assuming fairly mainstream American/English accents, the voice recognition isn't really the problem any longer. Sophisticated NLP and responses are. We're a long way from virtual assistants that can do anything sophisticated.