Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Predictive Learning [pdf] (drive.google.com)
210 points by aaronyy on Dec 7, 2016 | hide | past | favorite | 38 comments


Lecun has identified a real problem for AI -- the need to understand the real world, the link between intelligence and prediction over time. But the tools he is using are not the right ones.

Deep conv nets were not designed with prediction over time in mind. Here's one reason: Deep convolutional nets do not handle dynamical information from their lowest layers. By design, conv layers and pooling layers immediately begin discarding spatial information that could be used for building up predictions.

In contrast it is possible to start with recurrent/feedback networks at the very first layers of the network. These initial layers can begin building up predictions at the pixel, color, and lighting level (example: our recent preprint[1]).

My colleague, who as some of you know enjoys blogging, wrote a more thorough post in response to LeCun's recent CMU lecture on the same topic as these slides[2].

[1] https://arxiv.org/abs/1607.06854

[2] Blog: "A few comments on the Yann LeCun lecture at CMU, 11.2016" http://blog.piekniewski.info/2016/11/21/yann-lecun-cmu-11-20...


Hi, I'm the blogger.

I just want to add a very simple statement:

In order to create a model of the world, the machine learning substrate has to have the capacity to cover the observed dynamics.

Hard not to agree with this, almost sounds like a tautology. Now let's try to derive conclusions:

- World dynamics is full of multi-scale interactions (e.g. in vision illumination of a single pixel depends on the whole scene and the whole scene depends on many tiny details). To capture that, the machine learning substrate has to allow low level representations access high level stuff. Hence feedback all over the place. Which is exactly what is seen in the biological cortex. This is not recurrent layer made of LSTMs, this is a FULLY RECURRENT system.

This will not be achieved with any franken-neocognitron deep network neither with MSE, nor adversarial nor even triple adversarial loss function. This requires a new approach and together with several colleagues after a few years of continuous and intense thinking and modelling we have proposed a solution:

http://blog.piekniewski.info/2016/11/04/predictive-vision-in...

As well as a full paper https://arxiv.org/abs/1607.06854

Now, I'm not saying this is all done. This is just a beginning of a really exciting research adventure and many things look very promising. It will require however for the AI field to get out of a pretty "deep" local minimum it is in right now.


Yeah, not to be too dismissive or cliched, but these slides come off as if LeCun just now got around to reading some Andy Clark or reading neurosci/cogsci papers by Tenenbaum or Friston. The idea that the human brain has to work via active prediction rather than passive signal processing has been well-established in cogsci and neurosci for a while now.

The interesting question, then, is how we make machine-learning systems do prediction well. Probabilistic/generative models have been "wandering in the desert" for a while now because Monte Carlo methods are just so slow, especially for high-dimensional, hierarchical prediction problems like we want to solve in machine learning. On the upside, STAN now has automatic variational inference for continuous probability models, and work on things like the "concrete distribution" (https://arxiv.org/abs/1611.00712) can help us continuously approximate discrete probabilistic reasoning. Maybe as these techniques move into the mainstream in systems like Venture or Picture we can start to scale up predictive/generative/probabilistic modelling to match optimization-based connectionist methods?


The idea is not new indeed, Andy Clarks's review paper was very inspiring to us. But it has not been detailed to a point of implementation/scaling. PVM is an attempt to implement it in a "connectionist" way, but frankly all I need are associative memories, and how they are implemented I don't care. So a "probabilistic PVM" is totally feasible. In fact we discuss in the paper various possibilities in which a PVM like meta-architecture can be implemented.


Would it be feasible to replace the "common" component of a recurrent neural network with a convolutional neural network?

My lay person's impression is that at its most basic level a recurrent neural network is simply a "conveyor belt" of neural nets which are affected by external weights as well as by the weights from within the network. More precisely the "internal" weights coming from the layer of perceptrons operating at 1 level shallower than itself. So we're dealing in essence with 2 dimensions (shallower to deeper, and older to newer) instead of just one (shallower to deeper).


Why would you want that? Conv net is not some magic. It's just a crude way to reduce dimensionality by loosing spatial location. For some things it works, for some it doesn't.

I think the shift in thinking should rather be: instead of trying to build the best possible associative memory to associate some A with some B, take the memory modules we have (perhaps not perfect) and try to build something bigger out of them. A dynamical model of the observed reality seems like a great thing to build out of such modules.

And this is what the PVM is. Currently made out of shallow, plain vanilla perceptrons, builds a structure which can be arbitrarily deep. Without any "magical" tricks such as dropout, relu, convolution, pooling etc.


He also presented at CMU Robotics a few weeks ago (and used these slides). Video here: https://youtu.be/IbjF5VjniVE



For anyone that doesn't know Yann LeCun, he's the head of AI over at Facebook, but surprisingly he's positively and consistently straightforward concerning the current hype driving AI and its technologies. He deserves respect because of this alone.


Not to mention the father of CNNs.



Who are you? Joergen Schmidthueber?


Your Asteria AI project looks pretty sweet.


Thanks, are you working in AI? You should join our gitter channel.


"Straightforward" meaning what? Just bluntly dismissive of undue hype? What statements has he made to this effect?


Pretty much this in general. Check out any of his social profiles as he's pretty outspoken about his thoughts and opinions.


Memory Modules in vector space reminds me of some earlier work described as Associative Memory Modules (AMMs) http://sumve.com/biomimetic-cognition/biomimetic-api.html


If page 33 depicts the working of the brain on a very high level, the world model (or simulator) residing inside the agent must contain a model/simulator of the agent itself.

Could this give rise to self perception or consiousness?


The idea that self-representations give rise to consciousness exists in neurophilosopy and it is closely related to the representational theory of consciousness and the higher-order monitoring theory:

https://plato.stanford.edu/entries/consciousness-representat...

https://plato.stanford.edu/entries/consciousness-higher/

https://mitpress.mit.edu/books/self-representational-approac...

http://www.nyu.edu/gsas/dept/philo/courses/consciousness05/L...


I happened to be reading The Selfish Gene by Richard Dawkins and Gödel, Escher, Bach by Douglas Hofstadter at the same time, and both of them point at exactly this being the reason for consciousness. I was stunning at how both reached the same conclusion, that consciousness arrises from recursion of self perception, from very different points.

Also, if anyone is watching Westworld (spoilers), it seems to come to the same conclusion funnily enough. What finally gives the androids consciousness is some kind of recursive idea of listening to themselves.


Re Westworld: The theory of consciousness explored in the show is explored in more detail in Jaynes' The Origin of Consciousness in the Breakdown of the Bicameral Mind, as alluded to both in the show and in the title of the final episode. I've just picked it up, and it's a pretty interesting read so far. I've also noticed that a lot of little details from the book used in the show, such as referring to memories as "reveries" at points, and talking of minds as "hosts" of consciousness. I may need to re-watch the show after finishing the book!

I found a pdf version of the book here if you are interested. http://selfdefinition.org/psychology/Julian-Jaynes-Origin-of...


Michael Crichton, who wrote and directed the original Westworld (also of Jurassic Park fame), describes the same idea in his novel Prey[0], which is about emergent AI from swarms of self-replicating nano-bots.

0. https://www.amazon.com/Prey-Michael-Crichton/dp/0061703087/r...


During the speech [1], Yann (surprisingly) didn't mention consciousness at all. The focus of this segment was the need to "imagine" the future. The premise is that "common sense" – Yann's big theme of the talk – is about "filling in the gaps" of incomplete information. We fill in the gaps by imagining the future.

So consciousness was not raised at this point. But that doesn't mean that it couldn't be an emergent property.

[1] Am at NIPS and attended the speech.


>>During the speech [1], Yann (surprisingly) didn't mention consciousness at all.

I thought this ommission was deliberate to avoid distracting philisophical ratholes that weren't core to his talk.


I wouldn't assume it must, although it would be neat if it did. There are many ways to get agents to react to situations without self-awareness. In fact, the agents themselves can be decomposed into many otherwise incompatible sub-agents. See Minsky's Society of Mind.


this is an excellent point. This model has to be recursive and may be depth of recursion has something to do with self-perception?


I don't understand this argument. (It keeps coming up.) The two issues I have with this are:

a) Self-perception does not seem like consciousness to me at all. In meditation, if done properly, there is very little self left. It feels more like pure awareness. It is almost the opposite of the model of the self that the brain constructs.

b) I fail to see how the fact that a mechanism refers to itself should somehow give rise to the feeling of conciousness. Why would it? Nobody would predict consciousness from that if we would not already know it exists and it's easy to imagine a device that has a model of itself and is not self-aware.

I understand the need to somehow fit this into our scientific framework, and the idea that "consciousness is just what it feels to have a brain" is the best thing we have, but I don't think it explains anything. There is something we are missing.


Answers to "can we give consciousness to AI" is heavily dependent on how you define consciousness in the first place.

Many definitions can coexist, some more actionable than others. "Being aware of the existence of oneself in the world, and being able to reflect on oneself's decision" seems relatively practical. So, Self-perception + self-reflection = consciouness (as a definition)

From this starting point, it seems reasonable to derive that consciousness can arise from 1) mental representation of the world that include oneself 2) empathy for others (I can guess why this other worker has taken this decision) that, once applied to the actions of the self as if it were an external agent, gives self-reflection.


So then an x86 emulator running on x86 means the thing is conscious? No.. This is just pushing the essence of the problem onto the word 'aware'


Interesting how what "common sense" is evolved since the days of symbolic AI (cf. Cyc). The history presented begins rather late; show some love to the founders of the field...



Very interesting set of slides

It seems the current state of prediction is only slightly better than the state of image recognition pre multiple level NNs

There might be still a theoretical jump that's needed


Yes, the field has emerged out of MNIST/ImageNet and that is what those algorithms are optimised for. For modelling actual dynamics different design is necessary. It happens that the design that makes sense also seems to agree very well with the observed biology of the cortex. You can find links to our Predictive Vision Model in this thread as well as a few additional thoughts here: http://blog.piekniewski.info/2016/11/30/learning-physics-is-...


Will the other slides be posted? And how about videos?


My summary of the slides: to achieve "common sense", Unsupervised Learning is the the next step. Use all information in the data for predicting everything you can including the past. Use adversarial networks to do that.


These slides remind me of the worst PowerPoints ever.


Really? They have pictures and everything.


A wonderful demonstration of copyright violation/enforcement.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: