Activation functions

The activation function in each artificial neuron decides whether the incoming signals have reached the threshold and should output signals for the next level. It is crucial to set up the right activation function because of the gradient vanishing issue, which we will talk about later.

Another important feature of an activation function is that it should be differentiable. The network learns from the errors that are calculated at the output layer. A differentiable activation function is needed to perform backpropagation optimization while propagating backwards in the network to compute gradients of error (loss) with respect to weights, and then optimize weights accordingly, using gradient descent or any other optimization technique to reduce the error.

The following table lists a few common activation functions. We will dive into them a bit deeper, talk about the differences between them, and explain how to choose the right activation function: