I believe there already is quite some continuity, otherwise the colors would flicker much stronger from frame to frame. In the video it varies smoothly from frame to frame.
There are many instances of frame-to-frame discontinuity that I can't explain other than by a lack of object detection and labeling. It would be less wrong to use the color from the previous frame even if the lighting changes than to use an entirely different hue for the same object.
Only things like TV screens and other displays (and some interesting objects covered with micro surfaces that can cause light interference) can change color that rapidly given the same color incident light.
I think they get that not because of putting in that constraint, but only because subsequent frames are similar. That makes the coloring algorithm pick similar coloring.