This opens up ideas. One thing people have tried to do with stable diffusion is create animations. Of course, they all come out pretty janky and gross, you can't get the animation smooth.
But what if what if a model was trained not on single images, but animated sequential frames, in sets, laid out on a single visual plane. So a panel might show a short sequence of a disney princess expressing a particular emotion as 16 individual frames collected as a single image. One might then be able to generate a clean animated sequence of a previously unimagined disney princess expressing any emotion the model has been trained on. Of course, with big enough models one could (if they can get it working) produced text prompted animations across a wide variety of subjects and styles.
That's an interesting idea. I wonder if this would work with inpainting - erase the 16th cell and let the AI fill it in. Then upscale each frame. Has anyone experimented with this?
But what if what if a model was trained not on single images, but animated sequential frames, in sets, laid out on a single visual plane. So a panel might show a short sequence of a disney princess expressing a particular emotion as 16 individual frames collected as a single image. One might then be able to generate a clean animated sequence of a previously unimagined disney princess expressing any emotion the model has been trained on. Of course, with big enough models one could (if they can get it working) produced text prompted animations across a wide variety of subjects and styles.