I've found that clicking the Headphone icon and transcribing the few words makes this a much less annoying experience (especially as a non-American [1]). It also seems more forgiving of errors, though the audio is pretty clear anyway.
I have resorted to that more than once at this point. I consider that part of the "painful" as I may not always have speakers at the ready and/or at certain times of day feel a need to take the time to plugin headphones so as not to bother neighbors/etc. I also suspect given how many bots have moved on to using that themselves that those are going to get nastier and worse "soon" too.
I've found that clicking the Headphone icon and transcribing the few words makes this a much less annoying experience (especially as a non-American [1]). It also seems more forgiving of errors, though the audio is pretty clear anyway.
[1] https://news.ycombinator.com/item?id=25226805