The specific post I'm thinking of is *A Mechanistic Interpretability Analysis of...

The specific post I'm thinking of is A Mechanistic Interpretability Analysis of Grokking - https://www.alignmentforum.org/posts/N6WM6hs7RQMKDhYjB/a-mec...

I originally thought the PAIR article was another presentation by the same authors, but upon closer reading, I think they just independently discovered similar results. Though the PAIR article quotes Progress measures for grokking via mechanistic interpretability, the Arxiv paper by the authors of the alignmentforum article.

(In researching this I found another paper about grokking finding similar results a few months earlier; again, I suspect these are all parallel discoveries.)

You could say that all of these avenues of research are all re-statements of well-known properties, eg deep double-descent, but I think that's a stretch. Double descent feels related, but I don't think a 2018 AI researcher who knew about double descent would spontaneously predict "if you train your model past the point it starts overfitting, it will start generalizing again if you train it for long enough with weight decay".

But anyway, in retrospect, I agree that saying "the LessWrong community is where this line of analysis comes from" is false; it's more like they were among the people working on it and reaching similar conclusions.