If we're allowed to do *that*, why can't my activation function be sin(x)?

yobbo · on April 30, 2022

It can, but sin(x) has infinite number of extremes, and the gradients will vanish at those points. Activations will get stuck at 1 and -1 (x=π/2, 3π/2, ...). They set x+(1/a)*sin²(x) to be monotonic, which fixes this.

Or you need to optimize without using gradients.

blackcat201 · on April 30, 2022

You can use sin as activation function, but that would require careful initialization to avoid gradient explosion as you would ended up with a lot of points where gradient is simply zero. You can refer to Implicit Neural Representations with Periodic Activation Functions for more details.

zarzavat · on April 30, 2022

It works perfectly if you don’t have any parameters.

PartiallyTyped · on April 30, 2022

You can actually do that.

https://www.vincentsitzmann.com/siren/

sicp-enjoyer · on April 30, 2022

I'm surprised to see almost no discussion of fourier series in that paper, considering fourier series is all about representing signals as linear combinations of sinusoidal functions.

PartiallyTyped · on April 30, 2022

You may be interested in [1] where they go to a great extend to show that the convolution operation that we consider in DL is the dual of fourier series [2].

[1] https://geometricdeeplearning.com

[2] https://arxiv.org/pdf/2104.13478.pdf page 27 (23 if you count book pages).

amelius · on April 30, 2022

Is convolution in DL not implemented with the FFT as the underlying workhorse?

PartiallyTyped · on April 30, 2022

Probably no since FFT is slower and less parallelizable than products.

anoy8888 · on April 30, 2022

I like your article and went to your home page to find more good articles and I like what I saw. Thank you for sharing. The only thing is that I am reading from my phone and the site is not very mobile friendly