> But, obviously, there is no magic. That's not entirely true. As someone who ha...

sabertoothed · on June 18, 2023

> But, obviously, there is no magic. > That's not entirely true.

Yes, that is entirely true. There is no magic. Even if we don't understand some parts yet. There is no magic.

hahamaster · on June 18, 2023

It can absolutely be called magic when creators of LLMs themselves openly say that don't understand why they work the way they work. The word "magic" is very flexible ("his singing is magical", "it was a magical holiday in Vegas with four girls and me in the hotel room") and it can be definitely used in this context as "something wonderful we don't fully understand".

sigmoid10 · on June 17, 2023

>But from a theoretical standpoint it's not known why we really need all these fancy architectures

As someone who has been researching neural networks in a variety of settings for a very long time now, it is actually pretty obvious. There is also no real "magic" to it, even though it certainly might seem so to people who did not follow the academic world of research closely. But to those who do, all of this followed a pretty straightforward path, even though certain key steps were only obvious in hindsight. We already knew since the 90s that a perceptron with a single hidden layer can approximate any function (with some caveats that in practice only boil down to computational limits) with arbitrary accuracy, with the error scaling like 1/N with the number of hidden Neurons. But the proof of that theorem already shows that this is by far not the most efficient way to approximate functions. While in practice you could plug pixel values of an image directly into a perceptron, computationally it turned out to be hugely more efficient to use convolutions first as a dimensionality reduction scheme. This not just allowed people to train much larger networks on larger datasets, it also highlighted how additional layers enable hierarchical knowledge. So the first layer of such a network might only encode lines or circles, while deeper layers could encode noses and ears and eventually entire human faces. For language modeling, the thing holding everything back was also computability. Recurrent neural networks are theoretically even more powerful than simple perceptrons, but they come with a significant cost when computing gradients. Trying to improve these restraints is what eventually led to the transformer, which at its core is just an extremely scalable, general purpose, differentiable algorithm that you can optimise using backpropagation. We didn't need this architecture from a purely theoretical perspective, but we needed it in practice because our computing hardware is still very limited once we are trying to mimic actual biological neural networks as you would find them in the human brain.

PheonixPharts · on June 19, 2023

> But to those who do, all of this followed a pretty straightforward path, even though certain key steps were only obvious in hindsight.

The research still largely relies on post-hoc justification for these architectural benefits. We know CNNs work, we can open them up and see what they're doing, but we didn't get there from a theoretical foundation that predicted this outcome, nor do we have a real theoretical framework to justify them.

The history of pre-science is filled with post-hoc justification that is very similar, allows for practitioners to make progress, but ultimately has turned out to be wildly incorrect.

> it is actually pretty obvious.

In this entire reply you leave out the theoretical justifications to back up this claim. You show many example of intuitively why these architectures work, but never dive into the rigorous explanation, because such explanations don't exist yet.

This comment simply outlines the growing "bag of tricks" we've built up over the years to solve problems, along with the common post-hoc justifications. But at it's current state this is not different than alchemy, which did get some ideas correct, was able to create some useful practices but ultimately failed to provide a theoretical frame work for what was being done.

I don't know any serious deep learning researcher who disagrees that at this point the practice far out paces our theoretical understanding.

joeythedolphin · on June 18, 2023

Is the key to answering this question in the continued study of neurobiology? Are there any clues as to what the human brain is doing that apply to these concepts? Structuralism is radically popular, one would think if its are right, we should be able to grow conscious beings with a certain original blueprint.

sigmoid10 · on June 18, 2023

That opinion was held by a large part of the field for the longest time, and some actually still cling to it. These are usually the people who criticise transformers, because they go against everything they believe. But what we have seen in recent years points to the fact that capability of neural networks is only a question of size. Yes, the human brain uses some tricks like recurrent layers and convolutional layers as well - and to some extent it does so better than we currently can. But transformers have shown that you don't need any of that for language processing and not even for vision, showing once again that you only need a sufficiently sized network. The details of the architecture are not that important. In the same way that your microprocessor architecture does not really matter once you deal with high level programs in userland.

joeythedolphin · on June 18, 2023

Interesting, thanks!