Why use softmax as opposed to standard normalization?
What's the fuss around Softmax activation for output layer?
Created on August 14|Last edited on August 18
Comment
Problem
In the output layer of a neural network, it is typical to use the softmax function to approximate a probability distribution:
This is expensive to compute because of the exponents. Why not simply perform a Z transform so that all outputs are positive, and then normalize just by dividing all outputs by the sum of all outputs? (Originally asked in this Stack Overflow thread.)
Answer
The softmax function is
Run set
25
Add a comment