Hacker News new | past | comments | ask | show | jobs | submit login

The perplexity numbers are for different tasks. MIM is encoding a sentence into a latent variable and then reconstructing it, and achieves PTB perplexity 4.6. GPT-2 is generating the sentence from scratch, which will on average have higher perplexity numbers. I agree that PTB perplexity 4.6 on autoregressive language modeling would be a huge result.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: