Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is interesting, though a lot of the claims are highly debatable, but one really stood out to me:

> the tasks you do are more complex. That’s where voice input shines.

I don't know how far into the future this is supposed to be implemented, but I still have yet to find a form of voice input that's remotely accurate enough for anything "more complex". Current voice input seems to rely heavily on the scope for commands being limited, and even then it breaks down often with names and such.



I can only imagine the joy of an open office floor plan where employees also depend on a voice based interface for their computers.


To say nothing of accessibility.


This ad ran last night during The Finals and really hit home how audio can improve accessibility

https://www.youtube.com/watch?v=aqoXFCCTfm4&feature=youtu.be


If a talking interface becomes a popular option in addition to a visual interface, that would actually improve accessibility.


Solving that is simply a matter of developing a way for the computer to pick up inaudible or mouthed commands. Not a trivial task, but it does seem doable with current technology.


Okay so basically controll the user input with muscle movements. Hm wonder if some other group of muscles aren't more suitable for that, how about trying say our fingers for starters?

edit: I've read that NASA developed a microphone that doesn't need sound though so it is doable.


Just hurry up with the neural interface please :). Plug that baby into the port on the back of the neck, close your eyes, and get things done!


Myo tried a few years ago, looks like they're closing now.

https://support.getmyo.com/hc/en-us


Language is much more expressive than your fingers for certain tasks (and vice-versa). This design advocates for the use of _both_, not one or the other.


Language is more expressive, but I can press hundreds of buttons and type commands faster than I can yell expressive commands at my computer


I think it's very unlikely that any significant number of people can type faster than they can speak.

Depending on the complexity of the task, you might very well have to press "hundreds of buttons" in order to achieve the same result as a single sub vocalized voice command. Not to mention that speech can be far more intuitive than keyboard shortcuts or nested sub-menus for certain tasks.

Again though, this isn't about replacing keyboard input, it's about supplementing it.


Yeah, but I remember spending 20 minutes driving on the highway and trying to get google assistant to play the next episode of my podcast (Not the most recent! The NEXT ONE. Gaah)

It would be 4 taps to get done, and it is impossible to do via voice.


Similarly, I still have no idea how to get Siri to add a stop on a route instead of replacing the current destination.

Even worse, if you even use the word "stop" in a sentence it doesn't understand, it cancels the current driving directions.


Chorded typists write faster than speech needed to transcribe court sessions for example.

Also math is an example where you say oh so much more than you can with language.

The way I see it say a shell interface is a different language and it's so much more expressive.

or for example imagine playing a FPS with voice commands.


As seen in the movie Her.


Combining an engine to interpret voice commands with textual keyboard input seems like the best approach here. You get all the useful fuzziness of voice with none of the transcription errors (or revolting workplace noise pollution).

I'm fairly well sold on some sort of "omnibar" interface concept where you just tell the computer what you want with a keyboard. Alfred, Spotlight, Google search, Wolfram Alpha, iCal's smart add (you can just type "2pm next Friday"), the Action palette in IntelliJ, VS Code and Sublime's command palette, and so on. That, just... everywhere. And if you're alone, sure, use voice instead. Just don't deprecate the keyboard: still the most reliable way to get accurate text into a computer.


Exactly, voice works amazingly as a "3rd interface" complementing keyboards and mice which aren't completely accurate, either, fwiw. Voice gives us more speed, frees up UI clutter and can free up our hands.

I'm working on a tool that explores this area more: https://www.lipsurf.com


Oops I was a bit unclear. I meant, include the engine part that matches English statements to commands and parameters but hook it up to a keyboard-driven text box instead of a microphone. So it's:

written text->'AI'->command vs. speech->'AI'->command

(Edit: and sure, voice as an adjunct per my first comment.)


Yes, please don't make voice input mandatory. In one place I work, the guy in next cube used Dragon Speech for navigation and typing (he's not disabled; he just thought it's hip.) It was beyond annoying to hear all the command input.


Take a look at Apple's new Voice Control feature coming to iOS and macOS. You can see it on youtube or on Apple's home page.


> Current voice input seems to rely heavily on the scope for commands being limited, and even then it breaks down often with names and such.

Voice commands are more like a CLI (short limited commands) than a GUI and the CLI has been proven to be much more composable and scriptable then GUI's. AFAIK it hasn't been done, but adding stdio and pipes to a voice interface could make it shine for complex workflows where a GUI fails.


Eye tracking could be useful for providing the necessary context for limiting the scope of commands.

I also believe that minimizing latency and eliminating the need for hot words will make a big difference in the usefulness of voice commands for more common tasks: https://twitter.com/Google/status/1125815241026166784


Yeah, voice based interfaces have huge potential but maybe its easier to we adapt ourselves to match what its expecting. Like a shorthand for voice. We already learn seemingly random words/numbers/spcial characters in terms of programming syntax. We could do that for voice too.. Or maybe a 'grunt' based interface? A grunt is pretty universal right? :P


Like what do they imagine voice really doing for CAD programs or Excel, ERP, video editing... at least I guess in Photoshop you could switch tools with voice with out moving the brush but that's not that compelling as an example of voice-for-complex-tasks.


It depends on the task and the UI. In some tools like CAD I need to switch tools a lot: add an element to sketch, constrain it, repeat. Switch back imto 3D, pick new construction plane, etc... most tools have pretty distinct names and could be called out faster than doing the mouse acrobatics.

We aren't used to talking to things instead of humans. But I think that voice input would be the next logical step beyond the search bars for progran features that have been showing up in the last few years. These search bars are halfway to freeform textual commands. Entering such commands by voice is then mostly bashing together existing technology to make something new.


<select item> "dimension as 20 millimeters and make reference" <select next item> "rotate 90 degrees and scale to radius 100 millimeters"

sounds to me that it would make CAD far less dependant on context switches between mouse/puck/pen and keyboard than it is.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: