Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In the post you mentioned that

>>"In those tasks training from scratch with this model architecture does not do as well as some other techniques we're researching, but it serves as a baseline."

Can you elaborate a little on that? Is the training the problem or is the model just not good at longer texts?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: