"Classical" CV and deep-learning CV needn't be opposing one another.
There are several cases in which the classical approach is emulated by deep networks - implementing the same carefully thought-out pipelines but in a way that leverages representations learned from huge datasets (which are undeniably very powerful).
This paper treats each 'pixel' of a CNN activation tensor as a local descriptor, clusters them, and describes an image as a bag-of-visual-words histogram.
There are several cases in which the classical approach is emulated by deep networks - implementing the same carefully thought-out pipelines but in a way that leverages representations learned from huge datasets (which are undeniably very powerful).
Some examples are:
* Bags of convolutional features for scalable instance search https://arxiv.org/pdf/1604.04653.pdf
This paper treats each 'pixel' of a CNN activation tensor as a local descriptor, clusters them, and describes an image as a bag-of-visual-words histogram.
* Learned Invariant Feature Transform https://arxiv.org/abs/1603.09114v2
This paper very explicitly emulates the entire SIFT pipeline for computing correspondences across pairs of images
* Inverse compositional spatial transformer networks https://arxiv.org/abs/1612.03897v1
This paper emulates Lucas-Kanade approach to computing the transform between 2 images with differentiable (trainable) components.
Also, don't forget that deformable part models are convolutional networks! https://arxiv.org/abs/1409.5403