Yeah, as interesting as the concept is, the lack of frame to frame consistency is a real problem. It also seems like the computing requirements would be immense—the article mentions burning through $10 in seconds.
You can do this at home on your own computer with a 40x0 consumer GPU at 1-2 fps. You have to choose a suitable diffusion model, there are models that provide sub-second generation of 1024x1024 images. The computing requirements and electricity costs are the same as when running a modern game.