Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Look into the modern nlp models. BERT and its many derivatives, RoBERTa, XLNet. Training all of these require roughly TB of data, and generally take days on multiple TPUs. You often can’t even fine tune on a single GPU without some clever tricks.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: