Did these podcasts have transcripts? You might be inadvertently evaluating it on... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		ma2rten on Sept 21, 2022 \| parent \| context \| favorite \| on: Whisper – open source speech recognition by OpenAI Did these podcasts have transcripts? You might be inadvertently evaluating it on data that it was trained on, which is basically cheating. Even if not, it might be trained on similar podcasts. Judging how good these kinds of models are is really hard.

petercooper on Sept 22, 2022 | [–]

No transcripts, no. And recent episodes, within the past couple of weeks, so probably not part of the training either.

WiSaGaN on Sept 21, 2022 | [–]

True. The test should only be done on the material released after the model.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact