Yeah, my understanding was that it was audio fingerprinting tv ads, not transcribing anything, but I wouldn’t be surprised if they were trying to vacuum up other stuff. That said, I think it should be feasible to do basic low-accuracy transcription on-device, especially with all the neural engine hardware making inference more efficient.