Did these podcasts have transcripts? You might be inadvertently evaluating it on data that it was trained on, which is basically cheating. Even if not, it might be trained on similar podcasts. Judging how good these kinds of models are is really hard.