I think it would've been more interesting to apply a fourier transformation to the image, convert that to audio, and apply the wah-wah (which is essentially just a low-pass filter) to that.
The filters used in the article are meant to be applied on linear sample streams. A RAW image is not a linear sample stream. Re-encoding the image into a linear sample stream, applying the effect and then decoding the resulting signal to a RAW image again could be truer to the idea of "Paris with an echo".