Came here to say the same thing, but a few off the top of my head
- attention is all you need
- image is worth 16x16 words (vit)
- openai clip
- transformer XL
- memorizing transformers / retro
- language models are few shot learners (gpt)
Most of these papers you list are about the model, and there is the original Transformer paper, and most of the others are some variations of the Transformer.
I think to get into the field, to get a good overview, you should also look a bit beyond the Transformer. E.g. RNNs/LSTMs are still a must learn, even though Transformers might be better in many tasks. And then all those memory-augmented models, e.g. Neural Turing Machine and follow-ups, are important too.
It also helps to know different architectures, such as just language models (GPT), attention-based encoder-decoder (e.g. original Transformer), but then also CTC, hybrid HMM-NN, transducers (RNN-T).
Diffusion models is also another recent different kind of model.
But then, what comes really short in this list, are papers on the training aspect. Most of the papers you list do supervised training, using cross entropy loss. However, there are many others:
You have CLIP in here, specifically to combine text and image modalities.
There is the whole field on unsupervised or self-supervised training methods. Language model training (next label prediction) is one example, but there are others.
And then there is the big field on reinforcement learning, which is probably also quite relevant for AGI.
Lol true, and I'm currently working on a project leveraging a clip model which is why my answer is largely skewed towards vision transformers. By no means a complete list :)
I keep getting CXOs asking for an ELI5 (or ELI45 for that matter) of how Transformers, LLMs, and Diffusion Models work. Any suggestions for a non-technical audience (paid items are fine, we can purchase).
He mentions a few of the bigger papers in multilayer perceptrons (aka deep networks) such as attention is all you need, I think a good place to dive in before coming back to visit some fundamentals.