Ultimate Guide to Activation Functions
A great place to find and learn about activation functions is Wikipedia; however, over the years, the table of activation functions has fluctuated wildly, functions have been added and removed time and time again. You can view a list of historical changes to this particular Wikipedia page here. The first introduction of the ‘table of activation functions’ was in November 2015 by the user Laughinthestocks. Since then, at the time of writing this article, there have been 391 changes to the Wikipedia page. In this article, I have written an algorithm to mine every unique Activation function out of the history of this Wikipedia page as of the 22nd of April 2022 so that I can list them all in one comprehensive document here. I have also provided additional links to appropriate research papers for activation functions where none had been, or in cases where no specific research paper could be located a relevant paper of interest is provided in-place.
Typically one would use tanh for an FNN and ReLU for a CNN.
If we included the Identity Activation function this list would contain 42 activation functions, although you could say with the inclusion of the bipolar sigmoid that it is indeed 42. I’ve not read ‘The Hitchhiker’s Guide to the Galaxy’. Seriously.
The derivative is provided w.r.t f(𝑥) if possible, but in instances this may not be the case; then it would be w.r.t 𝑥.
Binary step



Logistic, sigmoid, or soft step



There is also the the bipolar sigmoid(1.f-expf(-x)) / (1.f + expf(-x))
maybe Wolfram can help you with that derivative.
ElliotSig or Softsign



Hyperbolic tangent (tanh)



Arctangent / Arctan / atan



Softplus



Rectified linear unit (ReLU) (ReLU6)



Exponential linear unit (ELU)



Gaussian Error Linear Unit (GELU)



Scaled exponential linear unit (SELU)



Mish



Leaky rectified linear unit (Leaky ReLU)



Parametric rectified linear unit (PReLU)



Parametric Exponential Linear Unit (PELU)



S-shaped rectified linear activation unit (SReLU)



Bipolar rectified linear unit (BReLU)



Randomized leaky rectified linear unit (RReLU)



Sigmoid linear unit (SiLU) or Swish



Gaussian



Growing Cosine Unit (GCU)



Shifted Quadratic Unit (SQU)



Non-Monotonic Cubic Unit (NCU)



Shifted Sinc Unit (SSU)


No derivative supplied, refer to Wolfram or the paper.
Decaying Sine Unit (DSU)


No derivative supplied, refer to Wolfram or the paper.
Phish

No derivative supplied, refer to Wolfram or the paper.
SQ-RBF


Inverse square root unit (ISRU)



Inverse square root linear unit (ISRLU)



Square nonlinearity (SQNL)



Sigmoid shrinkage



“Squashing functions” (benchmark)




Maxout


Bent Identity



Sinusoid



Sinc (taming the waves)



ArSinH



Soft Clipping (goldilocks)



Piecewise Linear Unit (PLU)



Adaptive piecewise linear (APL)



Inverse Cubic


Soft Exponential



(42?) LeCun hyperbolic tangent

Further Reading
Comparison of new activation functions in neural network for forecasting financial time series (Logit & Probit)
Effectiveness of Scaled Exponentially-Regularized Linear Units (SERLUs)