Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

nobody really knows how AI works is one of those myths told by the media

It's not a myth. No one really understands how neural networks work. We don't know why a particular model works well. Or why any model works well. For example no one can answer why NNs generalize so well even when they have enough learning capacity to memorize all training examples. We can guess, but we don't know for sure. Most of the proofs you see in papers are there as fillers, so that papers seem more convincing. We rarely can prove anything mathematically about NNs that has any practical value or leads to any breakthroughs in understanding.

If we did really understand how NNs work, then we wouldn't need to do expensive hyperparameter searches - we would have a way to determine the optimal ones given a particular architecture and training data. And we wouldn't need to do expensive architecture searches, yet the best of the latest convnets have been found through NAS (e.g. EfficientNet), and there's very little math involved in the process - it's pretty much just random search.

Funny you mentioned the batchnorm paper - we still don't know why batchnorm is so effective - the paper gave an explanation (covariate shift reduction) which later was shown to be wrong (batchnorm does not reduce it), then several other explanations were suggested (smoother loss surface, easier gradient flow, etc), but we still don't know for sure. Pretty much every good idea in NN field is a result of lots of experimentation, good intuition developed in the process, looking at how a brain does it, and practical constraints. And yes, sometimes we're looking at the equations, and thinking hard, and sometimes we see a better way to do stuff. But usually it starts with empirical tests, and if successful, some math is used in the attempt to explain things. Not the other way around.

NNs are currently at a similar point as where physics was before Newton and before calculus.



> NNs are currently at a similar point as where physics was before Newton and before calculus.

I'm more inclined to compare with the era after Newton and Leibniz, but prior to the development of rigorous analysis. If you look at this time period, the analogy fits a bit better IMO -- you have a proliferation of people using calculus techniques to great advantage for solving practical problems, but no real foundations propping the whole thing up (e.g., no definition of a limit, continuity, notions of how to deal with infinite series, etc.).


Maybe. On the other hand, maybe a rigorous mathematical analysis of NNs is as useful as a rigorous mathematical analysis of computer architectures - not very useful. Maybe all you need is just to keep scaling it up, adding some clever optimizations in the process (none of the great CPU ideas like caches, pipelining, out of order execution, branch prediction, etc came from rigorous mathematical analysis).

Or maybe it's as useful as a rigorous mathematical analysis of a brain - again, not very useful, because for us (people who develop AI systems), it would be far more valuable to understand a brain on a circuit level, or an architecture level, rather than on a mathematical theory level. The latter would be interesting, but probably too complex to be useful, while the former would most likely lead to dramatic breakthroughs in terms of performance and capabilities of the AI systems.

So maybe we just need to keep doing what we have been doing in DL field in the last 10 years - trying/revisiting various ideas, scaling them up, and evolving the architectures the same way we've been evolving our computers for the last 100 years, with the hope there will be more clues from neuroscience. I think we just need more ideas like transformers, capsules, or neural Turing machines, and computers that are getting ~20% faster every year.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: