Softmax Function for a Probability Vector

Hi,
I know that the softmax function outputs probabilities with sum equal to 1. However, if we give it a probability vector (which already sums up to 1) , why does not it return the same values? For example, if I input [0.1 0.8 0.1] to softmax, it returns [0.2491 0.5017 0.2491], isn’t this wrong in some sense?

It is because of the way softmax is calulated. When you compute exp(0.1)/(exp(0.1)+exp(0.8)+exp(0.1)), the value turns out to be 0.2491.

Thanks for the answer. Yeah yeah that I know. But my question is, isn’t it wrong in some sense?

Softmax is an activation function. The purpose is not just to ensure that the values are normalized (or rescaled) to sum = 1, but also allow to be used as input to cross-entropy loss (hence the function needs to be differentiable).

For your case, the inputs can be arbitrary values (not necessarily probability vectors). It is possible that there’s a mix of positive and negative values which still sum = 1 (eg: [0.3, 0.8, -0.2].
Since softmax picks the class with the highest value, with the values being softly rescaled, hence the name ‘soft’-‘max’.

1 Like

Hi mbehzad!

Well, I suppose it depends on what your expectations are …

But you might wish to base your expectations on some other functions:

x**2 maps (-inf, inf) to [0.0, inf), but we don’t expect x**2 = x
to hold true for x >= 0.0, that is for values of x in [0.0, inf).

Or, back in the pytorch activation function world, torch.sigmoid() maps
(-inf, inf) to (0.0, 1.0), but torch.sigmoid (torch.sigmoid())
isn’t equal to torch.sigmoid().

Here’s another thing to consider:
softmax ([0.0 + delta, 1.0 - delta])

How would you like softmax() to behave when a negative delta
becomes zero and then crosses over to become positive? Bear in
mind, you want this behavior to be usefully differentiable to support
backpropagation.

K. Frank