>>"In those tasks training from scratch with this model architecture does not do as well as some other techniques we're researching, but it serves as a baseline."
Can you elaborate a little on that? Is the training the problem or is the model just not good at longer texts?
>>"In those tasks training from scratch with this model architecture does not do as well as some other techniques we're researching, but it serves as a baseline."
Can you elaborate a little on that? Is the training the problem or is the model just not good at longer texts?