Does NLLLoss handle Log-Softmax and Softmax in the same way?

malioboro · October 19, 2017, 3:34am

from documentation here : http://pytorch.org/docs/master/nn.html#torch.nn.LogSoftmax, log-softmax is defined as:

f(x)=log(softmax(x))

as I know, using log in probabilities will change the high-low value, so biggest value in softmax(x) will be the smallest value in log(softmax(x)).

Will it change the way Negative Log Likelihood compute loss when it implements it together in CrossEntropy?

malioboro · October 19, 2017, 3:54am

Oh., I’ve just read from NLLLoss documentation, that NLLLoss is implemented differently than what I’ve known before:

loss(x, class) = -x[class]

it doesn’t use log function. So, does it means I can’t use Softmax and NLLLoss together?

ptrblck · October 19, 2017, 12:34pm

so biggest value in softmax(x) will be the smallest value in log(softmax(x)).

The softmax function returns probabilities between [0, 1].
The log of these probabilities returns values between [-inf, 0], since log(0) = -inf and log(1) = 0.
That is why the order won’t change.

However, you should use the NLLLoss with a log_softmax output
or CrossEntropyLoss with logits if you prefer not to add an extra log_softmax layer into your model.

malioboro · October 19, 2017, 11:00pm

oh… that’s right! I thought log(0) = inf

Thanks,

ANKITA_DAS · January 10, 2020, 12:30am

If the order doesn’t change, why do we have to use a ‘log-softmax’ with NLLLoss?

ptrblck · January 10, 2020, 1:03am

By “order” I meant the range of the outputs will still be in the same order.
I.e. if p0 < p1 created by softmax , then log(p0) < log(p1).

nn.NLLLoss expects the log probabilities, as the loss will be calculated as described here.
Note that nn.CrossEntropyLoss applies internally F.log_softmax and nn.NLLLoss afterwards, which is why it expects raw logits instead.

ANKITA_DAS · January 10, 2020, 2:30am

Yes I do get that. Yet I have another question if log-softmax gives values in the range [-infinity,0] that means that the values are negative,and the NLLLoss functon is -log(y), where y=log-softmax(x) , but log of some negative value isn’t defined so how does is work?

ptrblck · January 10, 2020, 2:40am

The formula in the docs is the negative log softmax written as:

- log ( exp(x[class]) / sum(exp(x[j]))

x are the logits here while exp()/sum(exp()) is the softmax function.

ANKITA_DAS · January 10, 2020, 3:26am

Sorry but, I searched for it in the documentation,I didn’t find it explicitly mentioned anywhere that negative of log_softmax goes in as input to the nllloss function.I don’t know if I’m missing something important, but please check the docs once.

ptrblck · January 10, 2020, 4:45am

From the nn.NLLLoss docs:

The input given through a forward call is expected to contain log-probabilities of each class. input has to be a Tensor of size either […].
Obtaining log-probabilities in a neural network is easily achieved by adding a LogSoftmax layer in the last layer of your network. You may use CrossEntropyLoss instead, if you prefer not to add an extra layer.

Examples:
m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
# input is of size N x C = 3 x 5
input = torch.randn(3, 5, requires_grad=True)
# each element in target has to have 0 <= value < C
target = torch.tensor([1, 0, 4])
output = loss(m(input), target)
output.backward()

ANKITA_DAS · January 10, 2020, 5:11am

But, there isn’t any mention of negative log_softmax here! as you mentioned in this :Does NLLLoss handle Log-Softmax and Softmax in the same way?

ptrblck · January 10, 2020, 5:14am

Ah, sorry for the confusion, as I can see the misunderstanding now.

nn.NLLLoss expects log probabilities, so you should just apply F.log_softmax on your model output (not multiplying with -1!).

The formula posted in my previous post is, how the loss can be calculated, but you shouldn’t worry about the minus sign, as it will be applied internally for you.

The posted example shows, how to apply the criterion.

ANKITA_DAS · January 10, 2020, 5:23am

If the minus will be applied internally my doubt has been cleared, but they haven’t mentioned that in the documentation as per my findings.

ptrblck · January 10, 2020, 5:26am

The documentation mentions that log probabilities are expected and gives a code example.

What is missing and what statement do you think might have been helpful to solve your misunderstanding? The docs are by far perfect, so feedback is always welcome.

ANKITA_DAS · January 10, 2020, 5:40am

What needs to be given as input and what we get as output is fine, there is no problem in understanding that, but, if a minus is introduced internally I think there should be mention of that, as the screenshot of the documentation that you provided above it says

If it was mentioned somewhere that “nn.NLLLoss internally multiplies the log-probabilities by a minus”, it would have cleared the confusions!

Ashima_Garg · March 1, 2020, 10:23pm

As you said so biggest value in softmax(x) will be the smallest value in log(softmax(x)).
Does that mean to calculate accuracy, argmin of predictions will be used for classification in Sentimental analysis? I am using NLLLoss to calculate the loss.

Thankyou.

ptrblck · March 2, 2020, 12:03am

No, I just quoted this sentence and corrected it in this post.

Ashima_Garg · March 2, 2020, 11:20am

Thanks for the reply.

ADONAI_TZEVAOT · November 16, 2020, 6:40pm

Hi, pls could you clear my doubt, I have a multi-label classification problem, where each label can either be positive or negative, so each label on its own is binary in nature. The last layer in my network is LogSoftmax , so what loss function should I use ? BCEloss ? And how do i calculate the class weights? Pls help.

ptrblck · November 17, 2020, 6:41am

When each label is either positive or negative it would be a binary classification or a (2 class) multi-class classification.
You can either return two logits (or log probabilities) from the model and use nn.CrossEntropyLoss (or nn.NLLLoss) or alternatively you can return a single logit and use nn.BCEWithLogitsLoss.