Does NLLLoss handle Log-Softmax and Softmax in the same way?

from documentation here :, log-softmax is defined as:


as I know, using log in probabilities will change the high-low value, so biggest value in softmax(x) will be the smallest value in log(softmax(x)).

Will it change the way Negative Log Likelihood compute loss when it implements it together in CrossEntropy?

Oh., I’ve just read from NLLLoss documentation, that NLLLoss is implemented differently than what I’ve known before:

loss(x, class) = -x[class]

it doesn’t use log function. So, does it means I can’t use Softmax and NLLLoss together?


so biggest value in softmax(x) will be the smallest value in log(softmax(x)).

The softmax function returns probabilities between [0, 1].
The log of these probabilities returns values between [-inf, 0], since log(0) = -inf and log(1) = 0.
That is why the order won’t change.

However, you should use the NLLLoss with a log_softmax output
or CrossEntropyLoss with logits if you prefer not to add an extra log_softmax layer into your model.


oh… that’s right! I thought log(0) = inf


If the order doesn’t change, why do we have to use a ‘log-softmax’ with NLLLoss?

By “order” I meant the range of the outputs will still be in the same order.
I.e. if p0 < p1 created by softmax , then log(p0) < log(p1).

nn.NLLLoss expects the log probabilities, as the loss will be calculated as described here.
Note that nn.CrossEntropyLoss applies internally F.log_softmax and nn.NLLLoss afterwards, which is why it expects raw logits instead.

1 Like

Yes I do get that. Yet I have another question if log-softmax gives values in the range [-infinity,0] that means that the values are negative,and the NLLLoss functon is -log(y), where y=log-softmax(x) , but log of some negative value isn’t defined so how does is work?

The formula in the docs is the negative log softmax written as:

- log ( exp(x[class]) / sum(exp(x[j]))

x are the logits here while exp()/sum(exp()) is the softmax function.

Sorry but, I searched for it in the documentation,I didn’t find it explicitly mentioned anywhere that negative of log_softmax goes in as input to the nllloss function.I don’t know if I’m missing something important, but please check the docs once.

From the nn.NLLLoss docs:

The input given through a forward call is expected to contain log-probabilities of each class. input has to be a Tensor of size either […].
Obtaining log-probabilities in a neural network is easily achieved by adding a LogSoftmax layer in the last layer of your network. You may use CrossEntropyLoss instead, if you prefer not to add an extra layer.

m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
# input is of size N x C = 3 x 5
input = torch.randn(3, 5, requires_grad=True)
# each element in target has to have 0 <= value < C
target = torch.tensor([1, 0, 4])
output = loss(m(input), target)

But, there isn’t any mention of negative log_softmax here! as you mentioned in this :Does NLLLoss handle Log-Softmax and Softmax in the same way?

Ah, sorry for the confusion, as I can see the misunderstanding now.

nn.NLLLoss expects log probabilities, so you should just apply F.log_softmax on your model output (not multiplying with -1!).

The formula posted in my previous post is, how the loss can be calculated, but you shouldn’t worry about the minus sign, as it will be applied internally for you.

The posted example shows, how to apply the criterion.

If the minus will be applied internally my doubt has been cleared, but they haven’t mentioned that in the documentation :slightly_frowning_face: as per my findings.

The documentation mentions that log probabilities are expected and gives a code example.

What is missing and what statement do you think might have been helpful to solve your misunderstanding? The docs are by far perfect, so feedback is always welcome. :wink:

1 Like

What needs to be given as input and what we get as output is fine, there is no problem in understanding that, but, if a minus is introduced internally I think there should be mention of that, as the screenshot of the documentation that you provided above it says

If it was mentioned somewhere that “nn.NLLLoss internally multiplies the log-probabilities by a minus”, it would have cleared the confusions!

As you said so biggest value in softmax(x) will be the smallest value in log(softmax(x)).
Does that mean to calculate accuracy, argmin of predictions will be used for classification in Sentimental analysis? I am using NLLLoss to calculate the loss.


No, I just quoted this sentence and corrected it in this post.

Thanks for the reply.

Hi, pls could you clear my doubt, I have a multi-label classification problem, where each label can either be positive or negative, so each label on its own is binary in nature. The last layer in my network is LogSoftmax , so what loss function should I use ? BCEloss ? And how do i calculate the class weights? Pls help.

When each label is either positive or negative it would be a binary classification or a (2 class) multi-class classification.
You can either return two logits (or log probabilities) from the model and use nn.CrossEntropyLoss (or nn.NLLLoss) or alternatively you can return a single logit and use nn.BCEWithLogitsLoss.

1 Like