>the time it takes it to learn what works/doesn't work widens.
From the raw scaling laws we already knew that a new base model may peter out in this run or the next with some amount of uncertainty--"the intersection point is sensitive to the precise power-law parameters":
From the raw scaling laws we already knew that a new base model may peter out in this run or the next with some amount of uncertainty--"the intersection point is sensitive to the precise power-law parameters":
https://gwern.net/doc/ai/nn/transformer/gpt/2020-kaplan-figu...
Later graph gpt-3 got to here:
https://gwern.net/doc/ai/nn/transformer/gpt/2020-brown-figur...
https://gwern.net/scaling-hypothesis