I feel LeCun got roped in debating the likes of Marcus and Yudkowsky. This has m...

PoignardAzur · on Feb 18, 2024

> long-term scene coherence,

FWIW none of the video models released so far demonstrate any object coherence whatsoever, which suggests they don't have the higher level capabilities you mention yet.

In Sora, as soon as an object is obstructed by an obstacle or goes offscreen, it's likely to disappear or be radically transformed.

bonzaidrinkingb · on Feb 18, 2024

You've seen the demos of a couple holding hands and walking, or the museum shots where all the paintings maintain coherence, or a woman temporarily obscuring a street sign. Or you haven't seen those demos. Either way...

Jensson · on Feb 18, 2024

In the couple holding hands videos you see the people walking in front duck into a wall and disappear, the girl in front walks into the fence and disappears, another girl walks straight through that fence. These aren't just issues with forgetting, it completely doesn't understand how things works. It draws a fence but then doesn't understand that people can't walk through it.

Or the video with the dog, that dog phases straight through those window shutters as if they weren't there and were rendered in layers rather than 3d. It doesn't understand the scenes it draws at all, it had shadows from those shutters so they were drawn to have depth, but that dog then were rendered on top of those shutters anyway and moved straight through them. You even see their shadows overlap since the shadow part is treated differently apparently, so it "knows" they overlap but also renders the dog on top, telling me that it doesn't really know any of that at all and is just based on guessing based on similar looking data samples.

And this in videos handpicked because they were especially good. We should expect the videos we are able to generate to be way worse than the demo in general. They didn't even manage to make a dog that moves between windows without such bugs, that was the best they got and even that was had a very egregious error for a very short clip.

Doxin · on Feb 20, 2024

The primary thing I noticed is that it doesn't quite seem to grasp that time runs forward either. In the video with the dogs in the snow you can see snow getting kicked up in reverse. I.E. snow arcs through the air and lands right as a paw gets placed.

Kind of made me wonder how these videos would look run backwards, but not enough to figure out how to make them run backwards.

EDIT: wow, the "backwards" physics is especially noticeable in the chair video[0]. Aside from the chair morphing wildly, notice how it floats and bounces around semi-physically. Clearly some issues grasping cause and effect.

[0] https://www.youtube.com/watch?v=lfbImB0_rKY

kgwgk · on Feb 18, 2024

If the "couple holding hands and walking" one is the "Beautiful, snowy Tokyo city is bustling. ..." look at the traffic on the left side of the frame:

https://www.youtube.com/watch?v=ezaMd4l_5kw

We also have the spontaneous creation and annihilation of wolves and the shape-shifting chair:

https://www.youtube.com/watch?v=jspYKxFY7Sc

https://www.youtube.com/watch?v=lfbImB0_rKY

bamboozled · on Feb 18, 2024

That chair is fucking wild.

The more I watch the cherry blossom one, the more I see how wrong it is, even the fact there is Cherry blossoms in the middle of winter is just totally wack. I've seen it snow in Tokyo before during spring when the cherry blossoms were out, but you don't have a foot of snow on the roof like in the clip.

Edit: I know the prompt asked for the cherry blossoms in snow, but it's still a wild amount of snow which is somehow not covering the trees.

hackerlight · on Feb 18, 2024

If we are talking analogies, this is just Sora forgetting because of limitations of how the network handles the autoregressive dynamics. When they make a bigger version of Sora this will happen less. Sora aleady has unprecedented object permanence, see the woman walking in Tokyo scene where signs and people are reconstructed after two seconds of occlusion. Soon we will have object permanence following ten or more seconds of occlusion. Then a minute. Then three minutes. Then we will figure out a trick to store long term memory. What will people say then?

orwin · on Feb 18, 2024

Our brain can't work that in our long-term memory btw, that why each time we remember something, we change minor aspects of said thing.

kgwgk · on Feb 18, 2024

It's also what happens when we dream: everything is fluid. Things appear and disappear, people and places become someone or somewhere else, reading is difficult and hands are distorted.

namaria · on Feb 19, 2024

Because these systems are dreaming about their datasets. Or hallucinating about it, as people have decided to call lately. I won't say this is a dead end. I will say we are very, very short of any sort of actual intelligence.