To save CPU/GPU/TPU there should be a high-frequency sound, as in people can’t hear, so the computers talking to each other and switch to a faster way to communicate.
If this is included you also have way to detect if you are talking to a bot/duplex.
Yes, but it could be very subtle and low bandwidth at first, and once both sides were convinced the other was a machine switch to a full speed screeching 56k modem [1].
Or just communicate "hey actually connect to this HTTP/XMPP/whatever address on the internet and we'll continue this from there"
1. Probably a bit slower, I've heard modern VoIP lines don't work well with traditional modems?
Sure, but they should consider the future. That number will only get smaller, especially if Duplex or other services say "we'll handle all your phone and online bookings for you for $SMALL_FEE, and still forward other inquiries to your phone as before".
So just put it in the hearable spectrum. Phones already make all kinds of sounds that no one under the age of 35 has any clue what they mean or why they are needed, and frankly they aren't.
Yes, and lots of sounds that the human ear can hear but are not used to decode speech. Also the audio is frequently recoded as calls pass from infrastructure to infrastructure.
However while this is useful to bootstrap a new technology rollout, 10 years on its just technical debt.
The amount of tech debt in the system behind credit cards is crazy, because originally charges where phoned in to the card issuer manually, and everything from then on - magstripe, chip & PIN, online only transactions, etc, has all been built on top, and the leaky abstractions show through in daily difficulties with the card system for end users, like lack of real-time balance (in some cases), lack of transaction metadata, etc.
On the other hand, the credit card system's backcompat does mean that you can still accept credit cards when the power's out. You just write down the number (or use an imprint machine) and let the customer go. And the semantics of credit mean that you can still make that charge even if an online transaction would have resulted in a decline—offline transactions are never declined, they just cause overdrafts.
I wonder if that resilience is worth the immense amount of infrastructure and engineering that is spent on maintenance of the technical debt. Does that maintenance drive up processing fees? I suspect it does, but not in any amount sufficient to explain the size of those fees.
True! Although I think that's more of a byproduct, rather than something designed into the system at the moment, and I suspect we could do better with designing it. For example, I doubt many shops have those imprint machines any more.
I also think the tech debt is holding us back a long way. For example, why can't I see itemised receipts in my card statement? Paper receipts are on their way out, email receipts aren't linked to anything or structured data, but being able to see that I've spent $120 on shipping with Amazon in the last 12 months, so a Prime subscription would make sense, would be a great sort of financial tool to have. That isn't possible in the card network at the moment.
They imprint "machines" are still issued - but all* of them are just tossed away into storage, and never ever used (Training users on those? Pointless).
Instead of a high frequency tone, just watermark the background noise or the speech pattern. You could watermark the background static, the voice samples, or even the speech patterns. All you really need is something like 30 bits of data to identify a call as a Duplex call with very high probability, and I’m certain you can find a way to imprint that many bits into the frequency spectrum of your background noise.
I like this. So basically the old school modem sound, but in frequency that can't be heard. It would only take a fraction of a second to send out the feeler, and would not be noticed if a live human picked up. Could even detect a human and send the call over to a live representative without anyone noticing.
It doesn't have to be out of frequency (since that's probably filtered anyway), could be just a really quick burst handshake identifier which could encode an IP address to communicate over instead of a crummy phone line.
Duplex: <beep beep> (I'm available to chat)
Other bot: <boop boop> (Oh hai! Wanna get intimate?)
Duplex: <blaaaaaaaart> (Come find me on duplex://64.233.160.0)
As an end user, picking up the phone to hear a beep is not pleasant. I'm likely as not to immediately hang up, as I've come to associate beeps at the start of calls with scammers.
What about if the caller makes no such sounds and the recipient makes the offer to handshake?
Anyways, this aspect is more amusing to just think about than anything else. That said, I really hope companies who produce these next-gen AI robo-callers actually have the courtesy of identifying themselves as such. I want to know if I am talking to a human or Duplex. Yes, I may hang up, but I feel uncomfortable being fooled into thinking I am talking to a human when I am not.
There's no reason why it can't be encoded as elevator music - you already hear it all the time, they might even throw in a looping "Thank you for calling. Your call is important to us" to keep you from freaking out.
Phone lines are optimized for frequencies humans can hear, though I'm guessing you could get enough bandwidth out of the edges to convince the other side you're a machine without bothering a human too much.
Yes, it would be nice if in parallel Google came up with open machine-friendly protocols for each of the use-cases Duplex supported, with a clear migration path away (e.x. businesses started publishing the endpoints and protocols they supported alongside their phone number so you could skip the call completely)