It's not really a computer science issue, per-se. Operationally, though, here ar...

It's not really a computer science issue, per-se.

Operationally, though, here are a few thoughts:

- Ensure that the persons doing this review do not have physical access to the systems outside of a secured environment (one in which outside audio and video recording devices are not allowed, and for whose presence is monitored, with physical access controls, etc.) Basically, not remote workers or a typical office environment. Most finserv call centers do this, so it's not particularly crazy to think they can do the same.

- Mask the voices such that they are intelligible but not identifiable. Maintain a limited set of "high access" taggers who can hear the raw clips if there is an issue with the masking.

- Limit the length of the clips (sounds like they already do this).

- Have pre-filters for anything personally identifiable in the audio. The metadata for the audio might already be de-identified, but what if the audio clip consists of the person reading out a phone number, credit card number, username, etc.? They should have their "high access" team building detectors for that and flag those portions of the audio or whole clips and route them to a limited access team.

- Make it more clear to customers that their audio, including accidental captures, can and will be sent to their servers. Make this very explicit, rather than burying it in TOS and using terms like "audio data". "The device may accidentally interpret your bondage safe word as a trigger and send your private conversations to our real-live human tagging team for review."

- Provide a physical switch that can temporarily disable the audio recording capability.

- Pay money, like cigarette companies do, to help fund a public education campaign that informs the general public about these listening bugs and mass surveillance issues so that people are aware of industry practices and how it affects them.

Edit:

I like what others are saying about explicit opt-in, as well as paid end-users. For quality/safety control, I don't know that they can exclusively use paid end-users. They probably need to sample some amount of their real live data. For that, explicit opt-in makes sense.