Great way to build labeled training data. User-submitted videos (with audio for ...

indoordin0saur · 2024-09-10T20:14:55 1725999295

Doesn't even need to be user guided. Use videos that have audio. You could have one AI that generates a transcript using the audio/video and another that watches the video on mute and tries to read the lips. Feedback would then be provided by the AI that had access to the audio.

0cf8612b2e1e · 2024-09-10T20:19:59 1725999599

I am thinking of the millions of hours of tv news. Presenters are almost always going to be the same position in frame and may already have high quality transcripts.