How to Implement the Softmax Function in Python
Learn how to implement the softmax function in Python, complete with code.
Created on August 6|Last edited on April 1
Comment
Table Of Contents
What Is The Softmax Function?
In the context of Python, softmax is an activation function that is used mainly for classification tasks. When provided with an input vector, the softmax function outputs the probability distribution for all the classes of the model. The sum of all the values in the distribution add to 1.
Normally, the input to the softmax function is the output of the last layer of a neural network. Hence, it is typically added at the end of the network. Now, let us see what the softmax function looks like.
What Does The Softmax Function Do?
The softmax function normalizes the input vector into a probability distribution that is proportional to the exponential of the input numbers. What this means is that the elements of the input vector can be positive, negative or zero. But the output values would always lie between (0,1). This makes it easy for interpretation.
The Softmax Function Illustrated
The input [0.5,0.6,1.0] to the softmax function is the output of the last fully connected layer of the neural network. The output of the softmax function is the probability distribution[0.266, 0.294,0.439] of all the classes. We have rounded off the values of the probability distribution to three places. Hence, the sum comes to 0.99 instead of 1.

The softmax function
How Does the Softmax Function Work?
The softmax function is given by
where is the probability of the i th element.
The total length of the input vector is K. We apply the exponential function on each element of the input vector and then normalize it by dividing it by the sum of all the exponential values of the input vector.
Implementing The Softmax Function In Python
One of the drawbacks of the softmax function is that it suffers from numeric instability.
and can both be very large numbers since they are exponential. Dividing two large numbers can cause instability.
It is because of this instability that implementing softmax becomes difficult. This blog does a good job at going into the details.
The stack overflow question on this topic is quite confusing as well. Among the many answers for that question, some are correct while some are incorrect. I especially liked the answer provided by ChuckFive.
Below, is the solution suggested by him.
import numpy as npdef softmax(z):assert len(z.shape) == 2s = np.max(z, axis=1)s = s[:, np.newaxis] # necessary step to do broadcastinge_x = np.exp(z - s)div = np.sum(e_x, axis=1)div = div[:, np.newaxis] # ditoreturn e_x / divx1 = np.array([[1, 2, 3, 6]])softmax(x1)
Summary
We hope you've found this outline to implementing the softmax function in Python.
If you have any questions, comments or suggestions, please feel free to add them in the comments below.
Recommended Reading
The Softmax Activation Function Explained
In this short tutorial, we'll explore the Softmax activation function, including its use in classification tasks, and how it relates to cross entropy loss.
An Introduction To The PyTorch View Function
Demystify the View function in PyTorch and find a better way to design models.
Setting Up TensorFlow And PyTorch Using GPU On Docker
A short tutorial on setting up TensorFlow and PyTorch deep learning models on GPUs using Docker.
Interpret any PyTorch Model Using W&B Embedding Projector
An introduction to our embedding projector with the help of some furry friends
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.