Hi,

I know that the softmax function outputs probabilities with sum equal to 1. However, if we give it a probability vector (which already sums up to 1) , why does not it return the same values? For example, if I input [0.1 0.8 0.1] to softmax, it returns [0.2491 0.5017 0.2491], isn’t this wrong in some sense?

It is because of the way softmax is calulated. When you compute `exp(0.1)/(exp(0.1)+exp(0.8)+exp(0.1))`

, the value turns out to be 0.2491.

Thanks for the answer. Yeah yeah that I know. But my question is, isn’t it wrong in some sense?

Softmax is an activation function. The purpose is not just to ensure that the values are normalized (or rescaled) to sum = 1, but also allow to be used as input to cross-entropy loss (hence the function needs to be differentiable).

For your case, the inputs can be arbitrary values (not necessarily probability vectors). It is possible that there’s a mix of positive and negative values which still sum = 1 (eg: [0.3, 0.8, -0.2].

Since softmax picks the class with the highest value, with the values being softly rescaled, hence the name ‘soft’-‘max’.

Hi mbehzad!

Well, I suppose it depends on what your expectations are …

But you might wish to base your expectations on some other functions:

`x**2`

maps `(-inf, inf)`

to `[0.0, inf)`

, but we don’t expect `x**2 = x`

to hold true for `x >= 0.0`

, that is for values of x in `[0.0, inf)`

.

Or, back in the pytorch activation function world, `torch.sigmoid()`

maps

`(-inf, inf)`

to `(0.0, 1.0)`

, but `torch.sigmoid (torch.sigmoid())`

isn’t equal to `torch.sigmoid()`

.

Here’s another thing to consider:

`softmax ([0.0 + delta, 1.0 - delta])`

How would you like `softmax()`

to behave when a negative `delta`

becomes zero and then crosses over to become positive? Bear in

mind, you want this behavior to be usefully differentiable to support

backpropagation.

K. Frank