> how or why is this sensible? The whole objective here is *personal learning* a...

tomp · on Dec 11, 2022

Thanks for extensive answer.

Do you have a few of these cutting-edge algorithmic advances papers in mind, could you list them?

I guess I got too pessimistic because of things like "emergent features" [1] / "grokking" [2] that seems to happen only with a lot of compute, and also the fact that the original (vanilla) transformer architecture remains (one of) the best, despite many additional ideas and "advances" (but that is only evident at large scale) [3].

Because of the points above, it's really hard for me, as a non-expert, to assess which papers are true advancements, and which were only published in pursuit of vanity metrics (e.g. publication counts) but actually represent overfit/cherry-picked results rather than robust progress.

[1] https://timdettmers.com/2022/08/17/llm-int8-and-emergent-fea...

[2] https://www.lesswrong.com/posts/N6WM6hs7RQMKDhYjB/a-mechanis...

[3] https://twitter.com/YiTayML/status/1551657355036676096

jdeaton · on Dec 12, 2022

I posted in another comment on this thread a list of papers which met these criteria for me at the time and which I learned a lot by implementing.

> it's really hard for me, as a non-expert, to assess which papers are true advancements

Its hard for me too, though I wouldn't consider myself an expert, just someone with a moderate amount of experience. Learning to discriminate important from less-important papers is another skill which takes effort to develop.