It's not too far off. Vid2vid is already decent at keeping character consistency when configured correctly, background/environment flickering is hard to control but since the process is currently done using img2img on successive frames that makes sense. I think we'll see new models that do temporal convolution soon that will make video -> video transformations absolutely stunning.
I'd be curious to see how well this plays with inpainting. Apparently img2img is also on the authors todo list.