A great place to find and learn about activation functions is Wikipedia; however, over the years, the table of activation functions has fluctuated wildly, functions have been added and removed time and time again. You can view a list of historical changes to this particular Wikipedia page here. The first introduction of the ‘table of activation functions’ was in November 2015 by the user Laughinthestocks. Since then, at the time of writing this article, there have been 391 changes to the Wikipedia page. In this article, I have written an algorithm to mine every unique Activation function out of the history of this Wikipedia page as of the 22nd of April 2022 so that I can list them all in one comprehensive document here. I have also provided additional links to appropriate research papers for activation functions where none had been, or in cases where no specific research paper could be located a relevant paper of interest is provided in-place.
Typically one would use tanh for an FNN and ReLU for a CNN.
If we included the Identity Activation function this list would contain 42 activation functions, although you could say with the inclusion of the bipolar sigmoid that it is indeed 42. I’ve not read ‘The Hitchhiker’s Guide to the Galaxy’. Seriously.
The derivative is provided w.r.t f(𝑥) if possible, but in instances this may not be the case; then it would be w.r.t 𝑥.
Binary step
Logistic, sigmoid, or soft step
There is also the the bipolar sigmoid(1.f-expf(-x)) / (1.f + expf(-x))
maybe Wolfram can help you with that derivative.
ElliotSig or Softsign
Hyperbolic tangent (tanh)
Arctangent / Arctan / atan
Softplus
Rectified linear unit (ReLU) (ReLU6)
Exponential linear unit (ELU)
Gaussian Error Linear Unit (GELU)
Scaled exponential linear unit (SELU)
Mish
Leaky rectified linear unit (Leaky ReLU)
Parametric rectified linear unit (PReLU)
Parametric Exponential Linear Unit (PELU)
S-shaped rectified linear activation unit (SReLU)
Bipolar rectified linear unit (BReLU)
Randomized leaky rectified linear unit (RReLU)
Sigmoid linear unit (SiLU) or Swish
Gaussian
Growing Cosine Unit (GCU)
Shifted Quadratic Unit (SQU)
Non-Monotonic Cubic Unit (NCU)
Shifted Sinc Unit (SSU)
No derivative supplied, refer to Wolfram or the paper.
Decaying Sine Unit (DSU)
No derivative supplied, refer to Wolfram or the paper.
Phish
No derivative supplied, refer to Wolfram or the paper.
SQ-RBF
Inverse square root unit (ISRU)
Inverse square root linear unit (ISRLU)
Square nonlinearity (SQNL)
Sigmoid shrinkage
“Squashing functions” (benchmark)
Maxout
Bent Identity
Sinusoid
Sinc (taming the waves)
ArSinH
Soft Clipping (goldilocks)
Piecewise Linear Unit (PLU)
Adaptive piecewise linear (APL)
Inverse Cubic
Soft Exponential
(42?) LeCun hyperbolic tangent
Further Reading
Comparison of new activation functions in neural network for forecasting financial time series (Logit & Probit)
Effectiveness of Scaled Exponentially-Regularized Linear Units (SERLUs)