Google "surrendered" multiple times before backstepping their decision, most recent of which was the project Dragonfly that was shut down less than two years ago.
Lila delves further in the Pirsig's Metaphysics of Quality (MOQ) which is quite fascinating. His dialogues on static vs dynamic quality coloured my view of the world for years.
This may be GANs as the author stated but the end result looks surprisingly similar to a bunch of pixel shaders making transitions between source and target images with said transitions
driven either by pure algorithms and/or derived from blurred images themselves.
I've implemented music visualizer ages ago using similar concepts (pure algos though, no real images). It happened when nVidia released the first affordable consumer video card with decent shader support. I think it was 6600GT . My animation part that made video dance to music was a bit more sophisticated though.
Regarding the music synchronization, OK, this is ancient stuff.
However, in terms of graphics, this strikes me as different from anything that was possible before the recent advances in GANs. During the era you're talking about, the art of shader-based music visualizers was being pushed by projects like Milkdrop 2, and nowadays a lot of similar research still happens on Shadertoy, and the demoscene, of course, hasn't stopped blowing people's minds.
But this is on another level entirely. It's as is the content and seemingly human concepts themselves are being smoothly animated.
this strikes me as different from anything that was possible before the recent advances in GANs
- well this is because you did not see my vis. It looked just like the one you saw on GAN's related link with similar transitions. Except that all imagery was generated by math formulas running in pixel shaders instead of ready bitmaps/videos.
Actually I played with the actual music video clips as a source of the imagery and the results were really cool but obviously other then experiment at home could not really do this part due to copyrights etc.
I guess one way to make these effects dance to music would be to make a Mel spectrogram of the audio, then somehow use the shapes in the spectrogram to apply deltas to the rendered frames.
The music animation code worked something like this:
1) Each of my pixel shaders was driven by let's say 32 parameters (do not remember the exact value)
2) The code would generate first set of said parameters and the second set (values were random) and start transitioning ( lerp ) between 2 with the length of transition of about 60 seconds.
3) Upon completion of the transition the first set would
be replaced by second set and the second set would be replaced by freshly generated third set and an infinum.
4) Seps 2 and 3 allowed for non stop fluid motion.
5) Lerp value the degree of transition between sets for each parameter would be modulated by sound (one FFT band for each also passed through synth like attack / decay.
6) Finally there was beat detection part which upon detecting a beat would invert lerp direction
There were more steps and various tricks to make it more interesting and non repetitive but I am not writing article here ;)
The end result was quite artistic. The visualizer was part of much bigger enterprise grade media playback / management / delivery /scheduling platform I've developed for hospitality industry