The structure of convolutional neural nets specifies much of the prior knowledge necessary for learning. In other words, the design of these neural nets makes a lot of correct assumptions about the nature of images (stationarity of pixel statistics, locality of pixel dependencies, and so on).
By structure, we are simply referring to the number of layers, the number of neurons in each layer, and the specific connections between neurons in each pair of neighboring layers, right?
So in this paper, they carefully chose a certain structure, set the weights randomly, and then what happened after that? I understand that they did not then train it with a training data set, but I'm not quite getting what they did with the single distorted image.
Well given that it's CNNs, you're leaving out weight sharing.
So by structure you should also include the demand that the prediction of any NxN patch of the image should be roughly equal to the prediction of any other NxN patch of the image.
Convolutional layers are designed, by and large, and they're mostly the same everywhere. Yann Le Cun came up with them in the mid-90's, but their academic origins go back to at least the 50's and 60's.
Handcrafted, but auto optimizing is a hot research topic right now with DeepMind et al. Needs to be evolved at deeper level than what they’re doing now so architecture is discovered, not just optimized.
No, the paper means primarily the weight sharing in each kernel filter within each convolutional layer, and the stacking of these layers in deep networks.