It can, but sin(x) has infinite number of extremes, and the gradients will vanish at those points. Activations will get stuck at 1 and -1 (x=π/2, 3π/2, ...). They set x+(1/a)*sin²(x) to be monotonic, which fixes this.
You can use sin as activation function, but that would require careful initialization to avoid gradient explosion as you would ended up with a lot of points where gradient is simply zero. You can refer to Implicit Neural Representations with Periodic Activation Functions for more details.
I'm surprised to see almost no discussion of fourier series in that paper, considering fourier series is all about representing signals as linear combinations of sinusoidal functions.
You may be interested in [1] where they go to a great extend to show that the convolution operation that we consider in DL is the dual of fourier series [2].
I like your article and went to your home page to find more good articles and I like what I saw. Thank you for sharing. The only thing is that I am reading from my phone and the site is not very mobile friendly