Ultimate Guide to Activation Functions

Fletch
ITNEXT
Published in
9 min readApr 22, 2022

--

A great place to find and learn about activation functions is Wikipedia; however, over the years, the table of activation functions has fluctuated wildly, functions have been added and removed time and time again. You can view a list of historical changes to this particular Wikipedia page here. The first introduction of the ‘table of activation functions’ was in November 2015 by the user Laughinthestocks. Since then, at the time of writing this article, there have been 391 changes to the Wikipedia page. In this article, I have written an algorithm to mine every unique Activation function out of the history of this Wikipedia page as of the 22nd of April 2022 so that I can list them all in one comprehensive document here. I have also provided additional links to appropriate research papers for activation functions where none had been, or in cases where no specific research paper could be located a relevant paper of interest is provided in-place.

Typically one would use tanh for an FNN and ReLU for a CNN.

If we included the Identity Activation function this list would contain 42 activation functions, although you could say with the inclusion of the bipolar sigmoid that it is indeed 42. I’ve not read ‘The Hitchhiker’s Guide to the Galaxy’. Seriously.

The derivative is provided w.r.t f(𝑥) if possible, but in instances this may not be the case; then it would be w.r.t 𝑥.

Binary step

Binary Step Activation
Binary Step Derivative

Logistic, sigmoid, or soft step

Sigmoid Activation
Sigmoid Derivative

There is also the the bipolar sigmoid(1.f-expf(-x)) / (1.f + expf(-x)) maybe Wolfram can help you with that derivative.

ElliotSig or Softsign

ref; https://www.ire.pw.edu.pl/~rsulej/NetMaker/index.php?pg=n01
ElliotSig Activation
ElliotSig Derivative

Hyperbolic tangent (tanh)

tanh Activation
tanh Derivative

Arctangent / Arctan / atan

Arctan Activation
Arctan Derivative

Softplus

Softplus Activation
Softplus Derivative

Rectified linear unit (ReLU) (ReLU6)

ReLU Activation
ReLU Derivative

Exponential linear unit (ELU)

ELU Activation
ELU Derivative

Gaussian Error Linear Unit (GELU)

ref; https://medium.com/syncedreview/gaussian-error-linear-unit-activates-neural-networks-beyond-relu-121d1938a1f7
GELU Activation
GELU Derivative

Scaled exponential linear unit (SELU)

ref; https://arxiv.org/pdf/1807.10117.pdf
SELU Activation
SELU Derivative

Mish

ref; https://github.com/digantamisra98/Mish
Mish Activation
Mish Derivative

Leaky rectified linear unit (Leaky ReLU)

LReLU Activation
LReLU Derivative

Parametric rectified linear unit (PReLU)

PReLU Activation
PReLU Derivative

Parametric Exponential Linear Unit (PELU)

PELU Activation
PELU Derivative

S-shaped rectified linear activation unit (SReLU)

ref; https://arxiv.org/pdf/1512.07030.pdf
SReLU Activation
SReLU Derivative

Bipolar rectified linear unit (BReLU)

ref; https://arxiv.org/pdf/1709.04054.pdf
BReLU Activation
BReLU Derivative

Randomized leaky rectified linear unit (RReLU)

RReLU Activation
RReLU Derivative

Sigmoid linear unit (SiLU) or Swish

Swish Activation
Swish Derivative

Gaussian

Gaussian Activation
Gaussian Derivative

Growing Cosine Unit (GCU)

GCU Activation
GCU Derivative

Shifted Quadratic Unit (SQU)

SQU Activation
SQU Derivative

Non-Monotonic Cubic Unit (NCU)

NCU Activation
NCU Derivative

Shifted Sinc Unit (SSU)

SSU Activation

No derivative supplied, refer to Wolfram or the paper.

Decaying Sine Unit (DSU)

DSU Activation

No derivative supplied, refer to Wolfram or the paper.

Phish

Phish Activation

No derivative supplied, refer to Wolfram or the paper.

SQ-RBF

SQ-RBF Activation
SQ-RBF Derivative

Inverse square root unit (ISRU)

ISRU Activation
ISRU Derivative

Inverse square root linear unit (ISRLU)

ISRLU Activation
ISRLU Derivative

Square nonlinearity (SQNL)

SQNL Activation
SQNL Derivative

Sigmoid shrinkage

Activation
Derivative

“Squashing functions” (benchmark)

Activation
w.r.t Activation
Derivative

Maxout

Maxout Activation
“Derivative”

Bent Identity

Bent Activation
Bent Derivative

Sinusoid

Sinusoid Activation
Sinusoid Derivative

Sinc (taming the waves)

Sinc Activation
Sinc Derivative

ArSinH

ref; https://en.wikipedia.org/wiki/Inverse_hyperbolic_functions
ArSinH Activation
ArSinH Derivative

Soft Clipping (goldilocks)

Activation
Derivative

Piecewise Linear Unit (PLU)

ref; https://arxiv.org/pdf/1809.09534v1.pdf
PLU Activation
PLU Derivative

Adaptive piecewise linear (APL)

ref; https://arxiv.org/pdf/1512.07030.pdf
APL Activation
APL Derivative

Inverse Cubic

Activation
Derivative

Soft Exponential

Activation
Derivative

(42?) LeCun hyperbolic tangent

ref; https://datascience.stackexchange.com/a/107616

Further Reading

Comparison of new activation functions in neural network for forecasting financial time series (Logit & Probit)

Effectiveness of Scaled Exponentially-Regularized Linear Units (SERLUs)

--

--