It's because BLOOM is undertrained, you can prune a lot of weights in BLOOM and ...

Der_Einzige · on Feb 19, 2023

In general, most giant LLMs are extremely undertrained at this time. Consider that most of the gains in RoBerta vs bert were from just continuing to train.

stevenhuang · on Feb 20, 2023

Cases of undertraining can be observed whenever the output is repeating gibberish or loops. Happened a lot in GPT2 ai dungeon days

leobg · on Feb 20, 2023

So can we continue training RoBERTa to get it to, say, GPT3 Ada level