What do you mean? Attention is just matrix multiplication and softmax, it's all ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		yldedly on July 3, 2023 \| parent \| context \| favorite \| on: Douglas Hofstadter changes his mind on Deep Learni... What do you mean? Attention is just matrix multiplication and softmax, it's all feed forward.

jebarker on July 3, 2023 [–]

I wasn't disputing that it's feed-forward. I just meant that stacked transformer layers can be thought of as an iterative refinement of the intermediate activations. Not the same as an autoregressive process that receives previous outputs as inputs, but far more expressive than a single transformer layer.

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact