This is interesting, though a lot of the claims are highly debatable, but one really stood out to me:
> the tasks you do are more complex. That’s where voice input shines.
I don't know how far into the future this is supposed to be implemented, but I still have yet to find a form of voice input that's remotely accurate enough for anything "more complex". Current voice input seems to rely heavily on the scope for commands being limited, and even then it breaks down often with names and such.
Solving that is simply a matter of developing a way for the computer to pick up inaudible or mouthed commands. Not a trivial task, but it does seem doable with current technology.
Okay so basically controll the user input with muscle movements. Hm wonder if some other group of muscles aren't more suitable for that, how about trying say our fingers for starters?
edit: I've read that NASA developed a microphone that doesn't need sound though so it is doable.
Language is much more expressive than your fingers for certain tasks (and vice-versa). This design advocates for the use of _both_, not one or the other.
I think it's very unlikely that any significant number of people can type faster than they can speak.
Depending on the complexity of the task, you might very well have to press "hundreds of buttons" in order to achieve the same result as a single sub vocalized voice command. Not to mention that speech can be far more intuitive than keyboard shortcuts or nested sub-menus for certain tasks.
Again though, this isn't about replacing keyboard input, it's about supplementing it.
Yeah, but I remember spending 20 minutes driving on the highway and trying to get google assistant to play the next episode of my podcast (Not the most recent! The NEXT ONE. Gaah)
It would be 4 taps to get done, and it is impossible to do via voice.
Combining an engine to interpret voice commands with textual keyboard input seems like the best approach here. You get all the useful fuzziness of voice with none of the transcription errors (or revolting workplace noise pollution).
I'm fairly well sold on some sort of "omnibar" interface concept where you just tell the computer what you want with a keyboard. Alfred, Spotlight, Google search, Wolfram Alpha, iCal's smart add (you can just type "2pm next Friday"), the Action palette in IntelliJ, VS Code and Sublime's command palette, and so on. That, just... everywhere. And if you're alone, sure, use voice instead. Just don't deprecate the keyboard: still the most reliable way to get accurate text into a computer.
Exactly, voice works amazingly as a "3rd interface" complementing keyboards and mice which aren't completely accurate, either, fwiw. Voice gives us more speed, frees up UI clutter and can free up our hands.
Oops I was a bit unclear. I meant, include the engine part that matches English statements to commands and parameters but hook it up to a keyboard-driven text box instead of a microphone. So it's:
written text->'AI'->command
vs.
speech->'AI'->command
(Edit: and sure, voice as an adjunct per my first comment.)
Yes, please don't make voice input mandatory. In one place I work, the guy in next cube used Dragon Speech for navigation and typing (he's not disabled; he just thought it's hip.) It was beyond annoying to hear all the command input.
> Current voice input seems to rely heavily on the scope for commands being limited, and even then it breaks down often with names and such.
Voice commands are more like a CLI (short limited commands) than a GUI and the CLI has been proven to be much more composable and scriptable then GUI's. AFAIK it hasn't been done, but adding stdio and pipes to a voice interface could make it shine for complex workflows where a GUI fails.
Eye tracking could be useful for providing the necessary context for limiting the scope of commands.
I also believe that minimizing latency and eliminating the need for hot words will make a big difference in the usefulness of voice commands for more common tasks: https://twitter.com/Google/status/1125815241026166784
Yeah, voice based interfaces have huge potential but maybe its easier to we adapt ourselves to match what its expecting. Like a shorthand for voice. We already learn seemingly random words/numbers/spcial characters in terms of programming syntax. We could do that for voice too.. Or maybe a 'grunt' based interface? A grunt is pretty universal right? :P
Like what do they imagine voice really doing for CAD programs or Excel, ERP, video editing... at least I guess in Photoshop you could switch tools with voice with out moving the brush but that's not that compelling as an example of voice-for-complex-tasks.
It depends on the task and the UI. In some tools like CAD I need to switch tools a lot: add an element to sketch, constrain it, repeat. Switch back imto 3D, pick new construction plane, etc... most tools have pretty distinct names and could be called out faster than doing the mouse acrobatics.
We aren't used to talking to things instead of humans. But I think that voice input would be the next logical step beyond the search bars for progran features that have been showing up in the last few years. These search bars are halfway to freeform textual commands. Entering such commands by voice is then mostly bashing together existing technology to make something new.
> the tasks you do are more complex. That’s where voice input shines.
I don't know how far into the future this is supposed to be implemented, but I still have yet to find a form of voice input that's remotely accurate enough for anything "more complex". Current voice input seems to rely heavily on the scope for commands being limited, and even then it breaks down often with names and such.