I have been reading about fractal image compression (don't sue me patent trolls, I haven't implemented anything I promise!) lately and so this doesn't surprise me very much. I also wouldn't be surprised if it turned out that what is happening inside the network is essentially similar in many respects to how fractal image compression works.
Fractal image compression is inherently pretty simple in concept: Find portions of the image that are similar to transformed other portions of the image. Store the things needed to transform one portion into the other, and patch together enough of those to cover the whole thing. Then you can throw out the image itself. You only need the ways in which the parts can be transformed into the others (affine transformations along with brightness/contrast shifts usually). Once you have those, you can literally start with _any_ source image and iterating the application of the transformations are guaranteed to result in something very close to the original image.
A CNN 'rediscovering' this technique feels intuitively like a very natural thing to occur, and the iterative images presented there smack of early iterations of a fractal image 'decoding' from a blank source image. The connection, of course, could be utterly specious and I am just guessing. I am intrigued, however, as I've been wanting to investigate using deep learning to perform VHS video capture cleanup as a side project for awhile now.
Fractal image compression is inherently pretty simple in concept: Find portions of the image that are similar to transformed other portions of the image. Store the things needed to transform one portion into the other, and patch together enough of those to cover the whole thing. Then you can throw out the image itself. You only need the ways in which the parts can be transformed into the others (affine transformations along with brightness/contrast shifts usually). Once you have those, you can literally start with _any_ source image and iterating the application of the transformations are guaranteed to result in something very close to the original image.
A CNN 'rediscovering' this technique feels intuitively like a very natural thing to occur, and the iterative images presented there smack of early iterations of a fractal image 'decoding' from a blank source image. The connection, of course, could be utterly specious and I am just guessing. I am intrigued, however, as I've been wanting to investigate using deep learning to perform VHS video capture cleanup as a side project for awhile now.