Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Great way to build labeled training data.

User-submitted videos (with audio for STT), user-crafted bounding boxes (we might not need these soon), and user-guided RLHF.

The submitted videos are likely diverse, challenging (otherwise the human might just do it), and representative of solving actual customer problems.



Doesn't even need to be user guided. Use videos that have audio. You could have one AI that generates a transcript using the audio/video and another that watches the video on mute and tries to read the lips. Feedback would then be provided by the AI that had access to the audio.


I am thinking of the millions of hours of tv news. Presenters are almost always going to be the same position in frame and may already have high quality transcripts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: