Exactly right! In fact, because that symmetry does not include an action on the parameters of the layer, your conserved quantity <gx, dx> should hold whether or not the network is stationary for a loss. This means that it'll be stationary on every single data point. (In an image classification model, these values are just telling you whether or not the loss would be improved if the input image were translated.)