While it's true that we don't differentiate the input samples, we do differentiate the loss function's output with respect to each of the network weights. We use the chain rule to calculate each of these "gradients" and that process is known as backpropagation.
(You might have intended to say this, in which cases I'm just trying to add clarity.)
(You might have intended to say this, in which cases I'm just trying to add clarity.)