# Does NLLLoss handle Log-Softmax and Softmax in the same way?

from documentation here : http://pytorch.org/docs/master/nn.html#torch.nn.LogSoftmax, log-softmax is defined as:

``````f(x)=log(softmax(x))
``````

as I know, using log in probabilities will change the high-low value, so biggest value in softmax(x) will be the smallest value in log(softmax(x)).

Will it change the way Negative Log Likelihood compute loss when it implements it together in `CrossEntropy`?

Oh., I’ve just read from NLLLoss documentation, that NLLLoss is implemented differently than what I’ve known before:

``````loss(x, class) = -x[class]
``````

it doesn’t use `log` function. So, does it means I can’t use `Softmax` and `NLLLoss` together?

2 Likes

so biggest value in softmax(x) will be the smallest value in log(softmax(x)).

The `softmax` function returns probabilities between [0, 1].
The log of these probabilities returns values between [-inf, 0], since `log(0) = -inf` and `log(1) = 0`.
That is why the order won’t change.

However, you should use the `NLLLoss` with a `log_softmax` output
or `CrossEntropyLoss` with logits if you prefer not to add an extra `log_softmax` layer into your model.

19 Likes

oh… that’s right! I thought `log(0) = inf`

Thanks,

If the order doesn’t change, why do we have to use a ‘log-softmax’ with NLLLoss?

By “order” I meant the range of the outputs will still be in the same order.
I.e. if `p0 < p1` created by `softmax` , then `log(p0) < log(p1)`.

`nn.NLLLoss` expects the log probabilities, as the loss will be calculated as described here.
Note that `nn.CrossEntropyLoss` applies internally `F.log_softmax` and `nn.NLLLoss` afterwards, which is why it expects raw logits instead.

1 Like

Yes I do get that. Yet I have another question if log-softmax gives values in the range [-infinity,0] that means that the values are negative,and the NLLLoss functon is -log(y), where y=log-softmax(x) , but log of some negative value isn’t defined so how does is work?

The formula in the docs is the negative log softmax written as:

``````- log ( exp(x[class]) / sum(exp(x[j]))
``````

`x` are the logits here while `exp()/sum(exp())` is the softmax function.

Sorry but, I searched for it in the documentation,I didn’t find it explicitly mentioned anywhere that negative of log_softmax goes in as input to the nllloss function.I don’t know if I’m missing something important, but please check the docs once.

From the nn.NLLLoss docs:

The input given through a forward call is expected to contain log-probabilities of each class. input has to be a Tensor of size either […].
Obtaining log-probabilities in a neural network is easily achieved by adding a LogSoftmax layer in the last layer of your network. You may use CrossEntropyLoss instead, if you prefer not to add an extra layer.

``````Examples:
m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
# input is of size N x C = 3 x 5
# each element in target has to have 0 <= value < C
target = torch.tensor([1, 0, 4])
output = loss(m(input), target)
output.backward()
``````

But, there isn’t any mention of negative log_softmax here! as you mentioned in this :Does NLLLoss handle Log-Softmax and Softmax in the same way?

Ah, sorry for the confusion, as I can see the misunderstanding now.

`nn.NLLLoss` expects log probabilities, so you should just apply `F.log_softmax` on your model output (not multiplying with `-1`!).

The formula posted in my previous post is, how the loss can be calculated, but you shouldn’t worry about the minus sign, as it will be applied internally for you.

The posted example shows, how to apply the criterion.

If the minus will be applied internally my doubt has been cleared, but they haven’t mentioned that in the documentation as per my findings.

The documentation mentions that log probabilities are expected and gives a code example.

What is missing and what statement do you think might have been helpful to solve your misunderstanding? The docs are by far perfect, so feedback is always welcome. 1 Like

What needs to be given as input and what we get as output is fine, there is no problem in understanding that, but, if a minus is introduced internally I think there should be mention of that, as the screenshot of the documentation that you provided above it says

If it was mentioned somewhere that “nn.NLLLoss internally multiplies the log-probabilities by a minus”, it would have cleared the confusions!

As you said `so biggest value in softmax(x) will be the smallest value in log(softmax(x)).`
Does that mean to calculate accuracy, `argmin` of predictions will be used for classification in Sentimental analysis? I am using NLLLoss to calculate the loss.

Thankyou.

No, I just quoted this sentence and corrected it in this post.

You can either return two logits (or log probabilities) from the model and use `nn.CrossEntropyLoss` (or `nn.NLLLoss`) or alternatively you can return a single logit and use `nn.BCEWithLogitsLoss`.