In the couple holding hands videos you see the people walking in front duck into a wall and disappear, the girl in front walks into the fence and disappears, another girl walks straight through that fence. These aren't just issues with forgetting, it completely doesn't understand how things works. It draws a fence but then doesn't understand that people can't walk through it.
Or the video with the dog, that dog phases straight through those window shutters as if they weren't there and were rendered in layers rather than 3d. It doesn't understand the scenes it draws at all, it had shadows from those shutters so they were drawn to have depth, but that dog then were rendered on top of those shutters anyway and moved straight through them. You even see their shadows overlap since the shadow part is treated differently apparently, so it "knows" they overlap but also renders the dog on top, telling me that it doesn't really know any of that at all and is just based on guessing based on similar looking data samples.
And this in videos handpicked because they were especially good. We should expect the videos we are able to generate to be way worse than the demo in general. They didn't even manage to make a dog that moves between windows without such bugs, that was the best they got and even that was had a very egregious error for a very short clip.
The primary thing I noticed is that it doesn't quite seem to grasp that time runs forward either. In the video with the dogs in the snow you can see snow getting kicked up in reverse. I.E. snow arcs through the air and lands right as a paw gets placed.
Kind of made me wonder how these videos would look run backwards, but not enough to figure out how to make them run backwards.
EDIT: wow, the "backwards" physics is especially noticeable in the chair video[0]. Aside from the chair morphing wildly, notice how it floats and bounces around semi-physically. Clearly some issues grasping cause and effect.
Or the video with the dog, that dog phases straight through those window shutters as if they weren't there and were rendered in layers rather than 3d. It doesn't understand the scenes it draws at all, it had shadows from those shutters so they were drawn to have depth, but that dog then were rendered on top of those shutters anyway and moved straight through them. You even see their shadows overlap since the shadow part is treated differently apparently, so it "knows" they overlap but also renders the dog on top, telling me that it doesn't really know any of that at all and is just based on guessing based on similar looking data samples.
And this in videos handpicked because they were especially good. We should expect the videos we are able to generate to be way worse than the demo in general. They didn't even manage to make a dog that moves between windows without such bugs, that was the best they got and even that was had a very egregious error for a very short clip.