Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>the time it takes it to learn what works/doesn't work widens.

From the raw scaling laws we already knew that a new base model may peter out in this run or the next with some amount of uncertainty--"the intersection point is sensitive to the precise power-law parameters":

https://gwern.net/doc/ai/nn/transformer/gpt/2020-kaplan-figu...

Later graph gpt-3 got to here:

https://gwern.net/doc/ai/nn/transformer/gpt/2020-brown-figur...

https://gwern.net/scaling-hypothesis



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: