It can, but sin(x) has infinite number of extremes, and the gradients will vanis...

It can, but sin(x) has infinite number of extremes, and the gradients will vanish at those points. Activations will get stuck at 1 and -1 (x=π/2, 3π/2, ...). They set x+(1/a)*sin²(x) to be monotonic, which fixes this.

Or you need to optimize without using gradients.