Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Would be interesting to get a list of those 40 papers mentioned


Came here to say the same thing, but a few off the top of my head

  - attention is all you need
  - image is worth 16x16 words (vit)
  - openai clip
  - transformer XL
  - memorizing transformers / retro
  - language models are few shot learners (gpt)
A few newer papers

  - recurrent block wise transformers
  - mobilevit (conv + transformer)
  - star (self taught transformer)


Most of these papers you list are about the model, and there is the original Transformer paper, and most of the others are some variations of the Transformer.

I think to get into the field, to get a good overview, you should also look a bit beyond the Transformer. E.g. RNNs/LSTMs are still a must learn, even though Transformers might be better in many tasks. And then all those memory-augmented models, e.g. Neural Turing Machine and follow-ups, are important too.

It also helps to know different architectures, such as just language models (GPT), attention-based encoder-decoder (e.g. original Transformer), but then also CTC, hybrid HMM-NN, transducers (RNN-T).

Diffusion models is also another recent different kind of model.

But then, what comes really short in this list, are papers on the training aspect. Most of the papers you list do supervised training, using cross entropy loss. However, there are many others:

You have CLIP in here, specifically to combine text and image modalities.

There is the whole field on unsupervised or self-supervised training methods. Language model training (next label prediction) is one example, but there are others.

And then there is the big field on reinforcement learning, which is probably also quite relevant for AGI.


We should have an Ask HN where the people in the know can agree on 40 papers that the rest of us idiots can go out and consume.



This idiot would love it explained to me as well.


Lol true, and I'm currently working on a project leveraging a clip model which is why my answer is largely skewed towards vision transformers. By no means a complete list :)


I keep getting CXOs asking for an ELI5 (or ELI45 for that matter) of how Transformers, LLMs, and Diffusion Models work. Any suggestions for a non-technical audience (paid items are fine, we can purchase).


This is quite a gentle introduction to Diffusion models, from the YouTube channel Computerphile.

https://youtu.be/1CIpzeNxIhU


I got a let out of Karpathy's video lectures on youtube, for example: https://www.youtube.com/watch?v=kCc8FmEb1nY

He mentions a few of the bigger papers in multilayer perceptrons (aka deep networks) such as attention is all you need, I think a good place to dive in before coming back to visit some fundamentals.


Maybe ask him on Twitter for it?


They asked on Twitter and he didn’t reply. We need someone with a blue check mark to ask. https://twitter.com/ifree0/status/1620855608839897094


Try asking @ilyasut directly


I would also really like to see that list of 40 papers.


Please, upvote parent comment :). I guess there is a lot of people who are wondering which papers he read.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: