Correct. There are several levels at which this applies:
Phone hardware (microphones, speakers) are only calibrated to detect 'useful' frequencies for human speech.
The sampling rate used by audio codecs tend to cut off _before_ the human ear's limits e.g. at 8kHz or 16kHz. They aren't even trying to reproduce everything the ear can detect; just human speech to decent quality.
Codecs are optimized to make human speech inteligible. The person listening to you on the phone isn't receiving a complete waveform for the recorded frequency range. The signal has been compressed to reduce the bandwidth required, where the goal isn't e.g. lossless compression; it's decent quality speech after decompression.
It's completely possible to play tones alongside speech that we won't notice, but in the general case, not tones that the human ear can't detect.
Phone hardware (microphones, speakers) are only calibrated to detect 'useful' frequencies for human speech.
The sampling rate used by audio codecs tend to cut off _before_ the human ear's limits e.g. at 8kHz or 16kHz. They aren't even trying to reproduce everything the ear can detect; just human speech to decent quality.
Codecs are optimized to make human speech inteligible. The person listening to you on the phone isn't receiving a complete waveform for the recorded frequency range. The signal has been compressed to reduce the bandwidth required, where the goal isn't e.g. lossless compression; it's decent quality speech after decompression.
It's completely possible to play tones alongside speech that we won't notice, but in the general case, not tones that the human ear can't detect.