Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To save CPU/GPU/TPU there should be a high-frequency sound, as in people can’t hear, so the computers talking to each other and switch to a faster way to communicate. If this is included you also have way to detect if you are talking to a bot/duplex.


Doesn't most carriers heavily "compress" the sound, removing all sounds/frequencies that a human can't hear, etc? https://www.youtube.com/watch?v=w2A8q3XIhu0


Yes, but it could be very subtle and low bandwidth at first, and once both sides were convinced the other was a machine switch to a full speed screeching 56k modem [1].

Or just communicate "hey actually connect to this HTTP/XMPP/whatever address on the internet and we'll continue this from there"

1. Probably a bit slower, I've heard modern VoIP lines don't work well with traditional modems?


Damn that sounds even more dystopian, can you imaging it

"Hello how can I help you? - Hi, beep, I like to reserve a table? - Ok, beep, beep, on second - Mhm-mm beep, beep, sceech, 011000101010...."



Also a great story along very similar lines:

http://archive.today/txrAd

(Archive link because that blog now requires authorization to view for some reason.)


This made my day


This is basically what 56k modems do, isn't it?



The whole context of Google introducing this functionality was for the 60% of businesses that don't yet have an online presence at all.


Sure, but they should consider the future. That number will only get smaller, especially if Duplex or other services say "we'll handle all your phone and online bookings for you for $SMALL_FEE, and still forward other inquiries to your phone as before".


[smallprint]...and we'll abruptly shut it down in 18 months.[/smallprint]


So just put it in the hearable spectrum. Phones already make all kinds of sounds that no one under the age of 35 has any clue what they mean or why they are needed, and frankly they aren't.


Perfect use case for the endangered fax answering sound.


beep boop.


Yes, and lots of sounds that the human ear can hear but are not used to decode speech. Also the audio is frequently recoded as calls pass from infrastructure to infrastructure.

Good times !


You’re totally right.

However while this is useful to bootstrap a new technology rollout, 10 years on its just technical debt.

The amount of tech debt in the system behind credit cards is crazy, because originally charges where phoned in to the card issuer manually, and everything from then on - magstripe, chip & PIN, online only transactions, etc, has all been built on top, and the leaky abstractions show through in daily difficulties with the card system for end users, like lack of real-time balance (in some cases), lack of transaction metadata, etc.


On the other hand, the credit card system's backcompat does mean that you can still accept credit cards when the power's out. You just write down the number (or use an imprint machine) and let the customer go. And the semantics of credit mean that you can still make that charge even if an online transaction would have resulted in a decline—offline transactions are never declined, they just cause overdrafts.


I wonder if that resilience is worth the immense amount of infrastructure and engineering that is spent on maintenance of the technical debt. Does that maintenance drive up processing fees? I suspect it does, but not in any amount sufficient to explain the size of those fees.


Yeah, but around here, most places just say the system is down and require cash.


True! Although I think that's more of a byproduct, rather than something designed into the system at the moment, and I suspect we could do better with designing it. For example, I doubt many shops have those imprint machines any more.

I also think the tech debt is holding us back a long way. For example, why can't I see itemised receipts in my card statement? Paper receipts are on their way out, email receipts aren't linked to anything or structured data, but being able to see that I've spent $120 on shipping with Amazon in the last 12 months, so a Prime subscription would make sense, would be a great sort of financial tool to have. That isn't possible in the card network at the moment.


They imprint "machines" are still issued - but all* of them are just tossed away into storage, and never ever used (Training users on those? Pointless).


Instead of a high frequency tone, just watermark the background noise or the speech pattern. You could watermark the background static, the voice samples, or even the speech patterns. All you really need is something like 30 bits of data to identify a call as a Duplex call with very high probability, and I’m certain you can find a way to imprint that many bits into the frequency spectrum of your background noise.


I like this. So basically the old school modem sound, but in frequency that can't be heard. It would only take a fraction of a second to send out the feeler, and would not be noticed if a live human picked up. Could even detect a human and send the call over to a live representative without anyone noticing.


It doesn't have to be out of frequency (since that's probably filtered anyway), could be just a really quick burst handshake identifier which could encode an IP address to communicate over instead of a crummy phone line.

Duplex: <beep beep> (I'm available to chat)

Other bot: <boop boop> (Oh hai! Wanna get intimate?)

Duplex: <blaaaaaaaart> (Come find me on duplex://64.233.160.0)

<insert hack attacks and other nonsense here>


<hacker voice="1">I'm in.</hacker>


As an end user, picking up the phone to hear a beep is not pleasant. I'm likely as not to immediately hang up, as I've come to associate beeps at the start of calls with scammers.


What about if the caller makes no such sounds and the recipient makes the offer to handshake?

Anyways, this aspect is more amusing to just think about than anything else. That said, I really hope companies who produce these next-gen AI robo-callers actually have the courtesy of identifying themselves as such. I want to know if I am talking to a human or Duplex. Yes, I may hang up, but I feel uncomfortable being fooled into thinking I am talking to a human when I am not.


There's no reason why it can't be encoded as elevator music - you already hear it all the time, they might even throw in a looping "Thank you for calling. Your call is important to us" to keep you from freaking out.


Hence the idea of doing it in a frequency humans can't hear.


Hence the:

> (since that's probably filtered anyway)

Phone lines are optimized for frequencies humans can hear, though I'm guessing you could get enough bandwidth out of the edges to convince the other side you're a machine without bothering a human too much.


No, let's make it fun. "How can I help you?" "Get off the phone you damn bot!" "Why I oughta...<modem sounds>"


Did we just reinvent dialup?


Human-readable modem handshake.


I want to hear a Duplex voice giving verbal AT commands.


+++ATH0


Yes, it would be nice if in parallel Google came up with open machine-friendly protocols for each of the use-cases Duplex supported, with a clear migration path away (e.x. businesses started publishing the endpoints and protocols they supported alongside their phone number so you could skip the call completely)


Or a way to book via a website or app...


"Ba weep granna weep ninny bong"

It is the universal greeting for cybernetic organisms, after all.


maybe just cut the chase and do an API for reservation?


Why not just a pattern of umms and arrs that it already seems to add into output. Easier to detect for it and harder for a human to recognise.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: