> The tanh smashing function just makes sure nothing can blow up into large numb...

quantadev · on Oct 4, 2024

You can explain the "effect" of tanh at any level of abstraction you like, up to including describing things that happen in Semantic Space itself, but my description of what tanh is doing is 100% accurate in the context I used it. All it's doing is squashing a number down to below one. My understanding of how the Perceptron works is fully correct, and isn't missing any details. I've implemented many of them.

beckhamc · on Oct 4, 2024

Your description of tanh isn't even correct, it squashes a real number to `(-1, 1)`, not "less than one".

You're curious about whether there is gain in parameterising activation functions and learning them instead, or rather, why it's not used much in practice. That's an interesting and curious academic question, and it seems like you're already experimenting with trying out your own kinds of activation functions. However, people in this thread (including myself) wanted to clarify some perceived misunderstandings you had about nonlinearities and "why" they are used in DNNs. Or how "squashing functions" is a misnomer because `g(x) = x/1000` doesn't introduce any nonlinearities. Yet you continue to fixate and double down on your knowledge of "what" a tanh is, and even that is incorrect.

quantadev · on Oct 4, 2024

When discussing `tanh squashing` among other AI experts it's generally assumed that even the most pedantic and uncharitable parsing of words won't be able to misinterpret "smashing to less than one" as an incorrect sentence fragment, because the "one", in that context, obviously refers to distance from zero.