I would be SUPER interested to see this same methodology applied to an iPhone or an Android phone resting in a room. I've already had the feeling that discussions that I had, later had an impact on my ads, but never knew if it was random or not.
I have mentioned this a number of times here on HN, but each time I'm told by other users that it just appears that way because the topic is fresh in my mind... I don't think that's the case.
Imo the truth is much scarier: advertises know us better than we known ourselves. There's also the idea that we are so alike, advertises can generalize to great success.
This is especially blatant when you don't fit into any category advertisers understand. Most of them think I'm either a hip urban youth or a well-off exurban housewife. I don't think they know what to do with someone who has associations, friendships, and interests that cross countless geographic and socioeconomic boundaries.
I think the truth is actually that we are MUCH less unique than we would like to believe, and that some combination of age + gender + location probably predicts like 80% of our interests.
The logic is that you talk about dozens of things around your phone per day yet you never see an ad targeted at you, you only notice it on the rare occasion it does happen. So either Apple is deciding to only target you with ads very rarely or it’s a rare coincidence.
Given the power demands of such an always-on analysis engine, that would show in battery runtime. At least if it's happening while the device is not plugged in.
On iOS, the word "Siri" can be detected at any time (if set so).
Maybe (just an hypothesis) there could be a set of other keywords also listened for, that once detected could start another more complex routine. Like that, you could limit the battery impact, and yet be able to listen to the users for advertising purpose.
On my Motorola from 8 years ago, the voice command was "OK, Google" or any custom short phrase you trained for it. But I think it would be really hard to get a useful but concise list of keywords... It would certainly be possible to listen for, say, "new Subaru" and show an ad if that one thing was detected, but the point is that you want to select which one of a million ads to show you'd need far more keywords, which gets computationally expensive quickly. It would probably be more battery efficient to compress audio that sounds like speech aggressively and then send it up to The Cloud for analysis...scary stuff, and part of the reason I don't have any home assistants!
A lot of people anecdotally believe this "they're listening", but I don't, since if it happens offline, it would consume a lot of power of the smart devices, and if it happens on the cloud, then the bandwidth and computing requirements would be gigantic and probably have poor ROI that it's not worth it. But maybe I'm overestimating those requirements.
Then again, I opened Instagram at a carwash once, and a few days later I got ads about car treatment products. I walked by an e-bike store the other day and stopped for about 15 seconds to look at the bikes being displayed, and a few days later Instagram started showing me ads for the brand. I thought Instagram or another Zuck-app was tracking my location, but I just checked and none of the Zuck-apps on my phone have location permission enabled.
Please, read the paper or at least the abstract before commenting. The paper is not about "at-rest" devices; it's about inferences made from the audio stream of activated commands. Ie, it's about the depth of processing done on "hey Alexa" commands, not the breadth of data that's being processed by the device.