Sure, but that should be considered an accuracy problem. Telling a system to do its best to extract words from background sounds, and then getting words from it, is a different type of problem.
-------
I can't reply to the below, but you have to consider the difference in the signal to noise ratio for why it should be considered a different problem.
If I told a binary image classifier to classify a clear image of a cat as either a "cat" or a "dog", and it said "dog", then that would be an accuracy problem.
If I gave the same classifier an image of a black cat standing in a very dark room, where even a human would have trouble identifying it, and it says "dog" it's not an accuracy problem as much as a signal to noise ratio problem.
It seems like you're making the assumption that all of these have the issues you describe have the same root cause. I don't think that's a sound assumption...tehe.
-------
I can't reply to the below, but you have to consider the difference in the signal to noise ratio for why it should be considered a different problem.
If I told a binary image classifier to classify a clear image of a cat as either a "cat" or a "dog", and it said "dog", then that would be an accuracy problem.
If I gave the same classifier an image of a black cat standing in a very dark room, where even a human would have trouble identifying it, and it says "dog" it's not an accuracy problem as much as a signal to noise ratio problem.
It seems like you're making the assumption that all of these have the issues you describe have the same root cause. I don't think that's a sound assumption...tehe.