Why use softmax as opposed to standard normalization?

What's the fuss around Softmax activation for output layer?

Created on August 14|Last edited on August 18

Comment

﻿
ProblemIn the output layer of a neural network, it is typical to use the softmax function to approximate a probability distribution:
  
This is expensive to compute because of the exponents. Why not simply perform a Z transform so that all outputs are positive, and then normalize just by dividing all outputs by the sum of all outputs?
(Originally asked in this Stack Overflow thread.)
AnswerThe softmax function is 
﻿
﻿
﻿
Run set25
﻿
﻿

Add a comment