Skip to main content

How to Implement the Softmax Function in Python

Learn how to implement the softmax function in Python, complete with code.
Created on August 6|Last edited on April 1

Table Of Contents

What Is The Softmax Function?

In the context of Python, softmax is an activation function that is used mainly for classification tasks. When provided with an input vector, the softmax function outputs the probability distribution for all the classes of the model. The sum of all the values in the distribution add to 1.
Normally, the input to the softmax function is the output of the last layer of a neural network. Hence, it is typically added at the end of the network. Now, let us see what the softmax function looks like.

What Does The Softmax Function Do?

The softmax function normalizes the input vector into a probability distribution that is proportional to the exponential of the input numbers. What this means is that the elements of the input vector can be positive, negative or zero. But the output values would always lie between (0,1). This makes it easy for interpretation.




The Softmax Function Illustrated

The input [0.5,0.6,1.0] to the softmax function is the output of the last fully connected layer of the neural network. The output of the softmax function is the probability distribution[0.266, 0.294,0.439] of all the classes. We have rounded off the values of the probability distribution to three places. Hence, the sum comes to 0.99 instead of 1.
The softmax function

How Does the Softmax Function Work?

The softmax function is given by
σ(zi)=eziΣj=1K(ezi)\sigma(z_i) = \frac{e^{z_i}}{\Sigma^K_{j=1}(e^{z_i})}
where sigma(zi)sigma(z_i) is the probability of the i th element.
The total length of the input vector is K. We apply the exponential function on each element of the input vector and then normalize it by dividing it by the sum of all the exponential values of the input vector.

Implementing The Softmax Function In Python

One of the drawbacks of the softmax function is that it suffers from numeric instability.
ezie^{z_i} and Σj=1K(ezi){\Sigma^K_{j=1}(e^{z_i})} can both be very large numbers since they are exponential. Dividing two large numbers can cause instability.
It is because of this instability that implementing softmax becomes difficult. This blog does a good job at going into the details.
The stack overflow question on this topic is quite confusing as well. Among the many answers for that question, some are correct while some are incorrect. I especially liked the answer provided by ChuckFive.
Below, is the solution suggested by him.
import numpy as np

def softmax(z):
assert len(z.shape) == 2

s = np.max(z, axis=1)
s = s[:, np.newaxis] # necessary step to do broadcasting
e_x = np.exp(z - s)
div = np.sum(e_x, axis=1)
div = div[:, np.newaxis] # dito
return e_x / div

x1 = np.array([[1, 2, 3, 6]])
softmax(x1)



Summary

We hope you've found this outline to implementing the softmax function in Python.
If you have any questions, comments or suggestions, please feel free to add them in the comments below.

Iterate on AI agents and models faster. Try Weights & Biases today.