Would be interesting to get a list of those 40 papers mentioned

tipsytoad · on Feb 3, 2023

Came here to say the same thing, but a few off the top of my head

  - attention is all you need
  - image is worth 16x16 words (vit)
  - openai clip
  - transformer XL
  - memorizing transformers / retro
  - language models are few shot learners (gpt)

A few newer papers

  - recurrent block wise transformers
  - mobilevit (conv + transformer)
  - star (self taught transformer)

albertzeyer · on Feb 3, 2023

Most of these papers you list are about the model, and there is the original Transformer paper, and most of the others are some variations of the Transformer.

I think to get into the field, to get a good overview, you should also look a bit beyond the Transformer. E.g. RNNs/LSTMs are still a must learn, even though Transformers might be better in many tasks. And then all those memory-augmented models, e.g. Neural Turing Machine and follow-ups, are important too.

It also helps to know different architectures, such as just language models (GPT), attention-based encoder-decoder (e.g. original Transformer), but then also CTC, hybrid HMM-NN, transducers (RNN-T).

Diffusion models is also another recent different kind of model.

But then, what comes really short in this list, are papers on the training aspect. Most of the papers you list do supervised training, using cross entropy loss. However, there are many others:

You have CLIP in here, specifically to combine text and image modalities.

There is the whole field on unsupervised or self-supervised training methods. Language model training (next label prediction) is one example, but there are others.

And then there is the big field on reinforcement learning, which is probably also quite relevant for AGI.

JKCalhoun · on Feb 3, 2023

We should have an Ask HN where the people in the know can agree on 40 papers that the rest of us idiots can go out and consume.

alan-stark · on Feb 3, 2023

Here it is: https://news.ycombinator.com/item?id=34641359

alostpuppy · on Feb 3, 2023

This idiot would love it explained to me as well.

tipsytoad · on Feb 3, 2023

Lol true, and I'm currently working on a project leveraging a clip model which is why my answer is largely skewed towards vision transformers. By no means a complete list :)

TuringNYC · on Feb 3, 2023

I keep getting CXOs asking for an ELI5 (or ELI45 for that matter) of how Transformers, LLMs, and Diffusion Models work. Any suggestions for a non-technical audience (paid items are fine, we can purchase).

cookingrobot · on Feb 3, 2023

This is quite a gentle introduction to Diffusion models, from the YouTube channel Computerphile.

https://youtu.be/1CIpzeNxIhU

nvrmnd · on Feb 3, 2023

I got a let out of Karpathy's video lectures on youtube, for example: https://www.youtube.com/watch?v=kCc8FmEb1nY

He mentions a few of the bigger papers in multilayer perceptrons (aka deep networks) such as attention is all you need, I think a good place to dive in before coming back to visit some fundamentals.

w-m · on Feb 3, 2023

Maybe ask him on Twitter for it?

username3 · on Feb 3, 2023

They asked on Twitter and he didn’t reply. We need someone with a blue check mark to ask. https://twitter.com/ifree0/status/1620855608839897094

alephxyz · on Feb 3, 2023

Try asking @ilyasut directly

9999 · on Feb 3, 2023

I would also really like to see that list of 40 papers.

robsun · on Feb 3, 2023

Please, upvote parent comment :). I guess there is a lot of people who are wondering which papers he read.