Look into the modern nlp models. BERT and its many derivatives, RoBERTa, XLNet. ...

		totoglazer on Sept 26, 2019 \| parent \| context \| favorite \| on: At Tech’s Leading Edge, Worry About a Concentratio... Look into the modern nlp models. BERT and its many derivatives, RoBERTa, XLNet. Training all of these require roughly TB of data, and generally take days on multiple TPUs. You often can’t even fine tune on a single GPU without some clever tricks.