> The system also sounds more natural thanks to the incorporation of speech disf...

jfaucett · on May 8, 2018

> as an intentional deficiency is an interesting design decision.

well, in semantics/pragmatics these discourse particles are often not deficiencies at all. They are signals with practical semantic purpose. "hmms" and "uhs" can signal attentiveness, turn-taking (turn holding, turn yielding, etc), agreement - just to name a few.

For any machine system to be able to pass as human, it will have to be able to control these nuances or people will pick up on something being wrong, though they might not be able to articulate precisely what.

ghayes · on May 8, 2018

I really enjoyed the machine's "uhs" and "uhms" in the demo speech. However, I felt the "uh-huh"s sounded forced. It's funny how these subtleties are very important in human conversation.

2bitencryption · on May 8, 2018

I think probably because "uh-huh" can have many different meanings based on inflection!

As a "non-word", it relies heavily on how it is conveyed.

Imagine someone asks you a question, I bet you can answer using just the word "uh-huh" but conveying these different emotions:

rude, perky, bored, upset, annoyed, dubious, excited

and probably a dozen more.

Even using the "perky" or "happy" one in a situation where it isn't warranted might sound rude or unthoughtful!

luckydata · on May 8, 2018

It's not a new thing. A famous tax preparation software introduced a "compute" screen that took a few seconds to make people more comfortable with the results even if the computation itself is instantaneous.

jdietrich · on May 8, 2018

It's really just an audio version of a loading bar or spinner - users get really uncomfortable if the UI becomes unresponsive for even a few hundred milliseconds, but they'll wait for several seconds if it looks like something is happening.

See also:

https://en.wikipedia.org/wiki/Comfort_noise

gowld · on May 8, 2018

People have learned that the spinner is non-progress, though. The progress bar still has some life in it, except that those are often fake, not measuring progress.

jdietrich · on May 8, 2018

OS-level cursor spinners like the mac pinwheel have lost credibility, because they don't reliably indicate whether the system is temporarily unresponsive or needs to be restarted. Modern multitasking OSes have a wide range of situations in which they can become mostly unresponsive without actually crashing.

Spinners on the application or UI element level are more credible, but generally worse than a progress bar. They're still very useful as a comfort indicator for short delays.

Progress bars have very low credibility on Windows, because users have learned that they're basically useless as an indicator of wait time. A progress bar might get stuck at 7%, then suddenly rush to 100%; conversely, it might get stuck at 95% but never finish. The bar offers no real indication of the actual level of progress; in most cases, this could be greatly improved with a bit of educated guesswork.

A completely fictitious progress bar can be extremely credible, because it's totally predictable - if you need to create a 10 second delay, then it's easy to make the bar progress linearly from 0% to 100% in that time. Users learn very quickly that your progress bar tells the truth about how long they'll be waiting, even though it's lying about the reason for the wait.

thaumasiotes · on May 8, 2018

> Progress bars have very low credibility on Windows, because users have learned that they're basically useless as an indicator of wait time. A progress bar might get stuck at 7%, then suddenly rush to 100%; conversely, it might get stuck at 95% but never finish. The bar offers no real indication of the actual level of progress

I disagree with this; I find the progress bars more credible with erratic timing. (And ideally, a display of the task currently at hand, like "Copying tiny file. Copying tiny file. Copying giant file............")

A progress bar that smoothly fills from 0 to 100 looks like an animation that somebody thought it would make you happy to watch. A progress bar that lags at 7% and then rushes the rest of the way looks like the software has some internal metric for task completion, and is reporting according to that metric. This implies that when the number changes, progress has happened, which isn't the case for a progress bar that isn't affected by workload.

The software can't use "how much time has elapsed?" as a progress metric, because it doesn't know how much time things will take, and because the passage of time does not actually cause -- or reflect -- any progress. That progress bar would be a spinner, not a progress bar.

TeMPOraL · on May 9, 2018

> Spinners on the application or UI element level are more credible, but generally worse than a progress bar. They're still very useful as a comfort indicator for short delays.

Strongly disagree. A spinner on the web UI element that lasts longer than ~1 second indicates for me that the site's JavaScript broke again, and it's time to reload or wait for the devs to notice and fix it.

fastball · on May 10, 2018

He's not talking about the cursor.

He's talking about a circular loading animation. Like the one that replaces the submit button when you're making a post on Twitter/Facebook.

TeMPOraL · on May 10, 2018

I'm talking exactly about that spinner. It's a lie. You quickly learn it has no relation whatsoever to what's happening in the background. And indeed it doesn't, because it's an animated GIF, completely detached from any logic or networking code!

(Compare the CLI spinner/fan - that "/ - \ |" animation used to indicate progress. There you know that each tick of the spinner means work has been done, because it has to be animated from code, and it's much simpler to just update it from the code that does the work.)

fastball · on May 17, 2018

That's not true at all. In the websites I and many others build, that loading spinner is linked directly to network code.

The spinner appears when a request is made. It disappears when the request is resolved.

TeMPOraL · on May 17, 2018

I was talking about animation. Show/hide on request made/resolved gives only binary information about starting and finishing something. But the spinning animation itself does not represent any operations being executed. It may very well be that the request failed and a bug in JS made it not remove the spinner. You end up with a forever-looping animation of "work", even though no work is being done. This makes the spinner an untrustworthy element.

fastball · on May 18, 2018

Still better than nothing? Sure, maybe sometimes exceptions aren't handled properly, but at least you know that it was trying to do something, rather than having users click a submit button 10x because there was no UI feedback whatsoever.

joosters · on May 9, 2018

The most annoying part of progress bars is the fact that programs so often use multiple bars. What's the point of watching a bar slowly reach 100%, only for it to be replaced with another progress bar that starts from 0 again?

gowld · on May 15, 2018

My apps add a second "outer" progress bar for that use-case.

fred256 · on May 9, 2018

The "please wait while we verify your passcode" on our corporate phone conference system drives me nuts. In the time that it took to speak that sentence, the passcode could have been verified millions of times.

throwaway1X2 · on May 9, 2018

That may be yet another use case of delays: makes bruteforcing (or even plain guessing a few common codes) a lot slower.

TeMPOraL · on May 9, 2018

In true market economy fashion, the comfort noise is also a perfect advertising opportunity.

For instance, I frequently deal with ATM machines that display "please wait" screens between every operation. Those screens last usually between 1 and 3 seconds, and it's obviously because the operations take that long, and totally not because they also display a half-screen or full-screen ad...

airstrike · on May 8, 2018

I've heard the HP12c calculator also slows down its screen refresh on purpose because people couldn't believe the math was right when it first came out and it was blazing fast.

cletus · on May 8, 2018

This is a pretty common pattern. Lots of websites also have "establishing secure connection" interstitials for the same reason.

mrep · on May 8, 2018

Yep, and the 10 second "deal" compilations for travel packages really happen in a fraction of a second. They just purposefully delay the results to make it seem like they are doing a lot of processing in finding all the possible deals and showing you the best ones.

jdietrich · on May 8, 2018

It can be a more friendly way of rate-limiting expensive DB queries. An interstitial that says "too many queries, try again in 10 seconds" is far more annoying than a loading bar.

Moto7451 · on May 8, 2018

Yup. We have a similar thing at my company. Every time we try to test out of the loading animation, conversion and retention goes down. It’s an amazing thing to see.

gowld · on May 8, 2018

That's the opposite of what parent and other commenters are saying. Users prefer the loading animations, according to the growth hackers.

freehunter · on May 8, 2018

I think they were agreeing. "Test out of" seems to be another way of saying "we tried getting rid of the spinners but people didn't like it"?

gowld · on May 8, 2018

That famous tax preparation software added several screens to "review" the data.

piyush_soni · on May 9, 2018

Most of the flight search companies do the same ("Finding the best/cheapest flights for you"). It's almost instantaneous, but they introduce this artificial wait.

spuz · on May 9, 2018

That seems unlikey. Flight search really does take a long time because they need to make API calls to external services for most customer requests and they need to refresh prices roughly hourly and so cannot rely on cached data. Also, even the best flight search websites are frustratingly slow. If that delay was created intentionally then they already lost me as a customer as a result.

piyush_soni · on May 9, 2018

I can't seem to find that post right now, but a person (on Quora/reddit I guess) who worked in the development team of a flight search company told this fact.

saulrh · on May 8, 2018

I don't know if I'd call it a "deficiency" - if we interpret "disfluency" in a literal sense as "not flowing" without negative connotation, then the interruptions (hmm, uh, okay) are actually communicating useful information to the other party. I might even say that omitting those interruptions (and replacing them with, say, dead silence) might be poor communication.

gowld · on May 8, 2018

The "um" isn't a deficiency, but the slow response is. If the response is artificially delayed to give the appearance of slow thinking, and an "um" added to fill the artificially long silence, that's an artificial deficiency.

adrianmonk · on May 8, 2018

I interpreted it differently. It isn't to give the appearance of slow thinking. It is to wait for the other person to be ready to accept the answer.

When talking to real humans, I've encountered people who don't do this, and I find it makes communication difficult and frustrating.

I'm not 100% sure why I need this pause, but I know I need it. Maybe I'm considering whether my question made sense or needs corrections/additions, so that I can't focus on the answer yet. Or maybe it takes time to switch the brain from "speaking mode" to "listening mode".

At any rate, when people do this, I have to ask them to repeat the first few words they said because I didn't catch them. And the reason I didn't catch them wasn't mumbling or background noise or anything. Well-formed sounds made it to my ear just fine, but my brain wasn't ready to accept them for a fraction of a second.

cryptoz · on May 8, 2018

It's not a deficiency if understanding is increased. If that fake pause increases the listener's understanding of the sentence (it might), then the 'slow response' is not a deficiency but an improvement.

Edit: should the robot talk at 2x normal speaking speed in order to more quickly convey the necessary information? Slowing the speech down artificially so a human could easily understand it sounds like a deficiency to me. (By your definition).

jonbarker · on May 8, 2018

Ums and other filler words are not as bad as they are made out to be among the public speaking crowd: https://www.eab.com/daily-briefing/2016/07/29/um-filler-word...

TomV1971 · on May 8, 2018

Reminds me of comfort noise in the telephone system.

Even though the system encodes silences noise free (so improve compression), it deliberately inserts noise because otherwise people think the line is dead.

dsjoerg · on May 8, 2018

Similar to how when designing a virtual face, it looks more natural if it has some slight asymmetries and "defects", and when designing a synthetic drumbeat, if it's "perfect" it sounds totally robotic.

Imperfection is natural and comfortable. Perfect corners and edges are artificial and weird to the distracting point.

paulkon · on May 8, 2018

People are more willing get on board with bots acting like people than the other way around.

The speech disfluencies used by Duplex in the salon and restaurant interactions are perfect examples of why natural speech sounds natural. It's the cadence as well as the timing.

andrei_says_ · on May 8, 2018

Google maps has the most pleasant and human sounding voice approach I’ve encountered in any such system.

All other gps guidance voices sound incredibly crude and mechanical in comparison.

goshx · on May 9, 2018

The Brazilian Portuguese voice is not that great.

fudged71 · on May 8, 2018

In my city I can recognize that google has two different 'voices' or voice libraries. They sound slightly different. I'm curious how that works and why it's not all done with one.

joeyo · on May 8, 2018

I've noticed this as well. My working hypothesis is that one is for high(er) bandwidth and the other for low bandwidth situations.

gusmd · on May 8, 2018

My understanding is that the "low-fi", more robotic one uses an offline TTS engine for when there is no connectivity. When connectivity is good, it will switch to better, cloud-based one.

L_226 · on May 9, 2018

Similar to lens flares in (first person) video games.

pg_bot · on May 8, 2018

If you are building a system that mimics human speech you need to teach it to be imperfect and use common parlance. Otherwise you will fall into the uncanny valley. If you listen to the conversation again there are several points where they lose immersion. For example no one would say 12 pm, they would say 'noon' instead. Google has clearly done some impressive work here, and I'm now a bit more confident/scared that they will be able to successfully fool me in the next few years.

shashanoid · on May 8, 2018

Your argument about calling it "noon" instead of "PM" is just illogical. I'd always use PM instead of noon -- whenever I'm trying to be specific about something (appointments and such). I understand the argument you're trying to make but that wasn't enough.

jstanley · on May 8, 2018

How is "PM" more specific than "noon"?

Absent the context of this conversation, it's not immediately obvious to me whether 12 PM is midday or midnight, where as "noon" is unambiguous.

macintux · on May 8, 2018

I think historically "noon" expresses less precision, although I suspect that's less true now that everyone always knows what time it is and has GPS in their pocket to help calculate arrival times.

20 years ago, had I said to someone "I'll be there at 12pm" it would have had a stronger implication of precision than "I'll be there at noon." I don't think it's true today.

itronitron · on May 8, 2018

12pm is also commonly interpreted as midnight so "12 noon" or "12 midnight" is generally preferred when scheduling meetings or deadlines in order to avoid confusion.

jstanley · on May 8, 2018

Understood. So the difference is you interpret "noon" as an ill-defined probability distribution centred roughly around midday, rather than a concrete point in time, whereas you interpret "12PM" as a concrete point in time. Fair enough.

macintux · on May 8, 2018

Go back further and "noon" is "whenever the sun is at its zenith" so we've definitely made strides.

croon · on May 9, 2018

Isn't that still the case, assuming central position within timezone and no summer time?

macintux · on May 9, 2018

Rarely.

https://physics.stackexchange.com/questions/38270/why-is-the...