That's only for pretraining a model. Very few groups pretrain models, since it i...

ericd · on Dec 5, 2020

Is there one model that you use more frequently than others as a base for these disparate fine tuning tasks? Basically, are there any that are particularly flexible?

ma2rten · on Dec 6, 2020

In general, BERT would be the most common one. RoBERTa is the same model but trained for longer, which turns out to work better. T5 is a larger model, which works better on many tasks but is more expensive.

ericd · on Dec 6, 2020

Thanks for the summary! I'm familiar with BERT, but less so the different variants, so that's quite helpful. I'll take a look at how RoBERTa works.

danieldk · on Dec 6, 2020

So far, of the models that run on GPUs with 8-16GiB VRAM XLM-RoBERTa has been the best for these specific tasks. It worked better than the multi-lingual BERT model and language-specific BERT models by quite a wide margin.

ericd · on Dec 6, 2020

Great, thanks very much for the pointer, especially the VRAM context - I'm looking to fine-tune on 2080Ti's rather than V100/A100s, so that's really good to know.