Multi-class cross entropy loss and softmax in pytorch


In this topic ,ptrblck said that a F.softmax function at dim=1 should be added before the nn.CrossEntropyLoss().
In the document(https://pytorch.org/docs/stable/nn.html?highlight=crossentropy#torch.nn.CrossEntropyLoss) , it return nll_loss(log_softmax(input, 1) which return negative log and softmax.
My question is that should I calculated the softmax at dim=1 before the function nn.CrossEntropyLoss which already have a softmax at dim=1 ?

1 Like

No, F.softmax should not be added before nn.CrossEntropyLoss.
I’ll take a look at the thread and edit the answer if possible, as this might be a careless mistake!
Thanks for pointing this out.

EDIT: Indeed the example code had a F.softmax applied on the logits, although not explicitly mentioned.
To sum it up: nn.CrossEntropyLoss applies F.log_softmax and nn.NLLLoss internally on your input, so you should pass the raw logits to it.

12 Likes

What loss function are we supposed to use when we use the F.softmax layer?

1 Like

Hi Brando!

If you want to use a cross-entropy-like loss function, you shouldn’t
use a softmax layer because of the well-known problem of increased
risk of overflow.

I gave a few words of explanation about this problem in a reply in
another thread:

You should either use nn.CrossEntropyLoss (which takes
pre-softmax logits, rather than post-softmax probabilities)
without a softmax-like layer, or use a nn.LogSoftmax layer,
and feed the results into nn.NLLLoss. (Both of these combine
an implicit softmax with the subsequent log in a way that avoids
the enhanced overflow problem.)

If you are stuck for some reason with your softmax layer, you
should run the probabilities output by softmax through log(),
and then feed the log-probabilities to nn.NLLLoss (but expect
increased risk of overflow).

(I am not aware of any single pytorch cross-entropy loss function
that takes post-softmax probabilities directly.)

Good luck!

K. Frank

5 Likes

Hi, if softmax is not to be used, how do we get the output as probabilities for a multi-class classification problem? I have explained my problem here. Please take a look at my code and help me out since I am a beginner.

You can just apply it to your output as normal. So
model.eval()
output = net(input)
sm = torch.nn.Softmax()
probabilities = sm(output)
print(probabilities )

2 Likes

Hello,

I have a question about Softmax() and CrossEntropyLoss().

In a multi-classification task, I set dim=1 in Softmax(). I wanna know if I need to set the similar parameter in CrossEntropyLoss(). However, I did not find the similar parameter as dim in softmax().

Thanks!

Hello,

I am new to PyTorch, and I encountered a quesiton about Softmax() and CrossEntropyLoss().

In a multi-classification task, I set dim=1 in Softmax(). I wanna know if I need to set the similar parameter in CrossEntropyLoss(). However, I did not find the similar parameter as dim in softmax().

Thanks!

nn.CrossEntropyLoss expects raw logits in the shape [batch_size, nb_classes, *] so you should not apply a softmax activation on the model output. The class dimension should be in dim1 in the model output.

1 Like

@ptrblck, suppose I have the output of a neural network to have shape [1000, 100, 4]. I have applied nn.Softmax() on axis 2 and then have taken nn.BCELoss with a target which has same shape and for each row and column index of the target I have a 1d vector (4 length) containing exactly one 1 (it is a one hot vector). Does this setup work or is there a flaw?

nn.BCELoss can be applied with torch.sigmoid for a multi-label classification. Since you are using softmax, I assume you are working on a multi-class classification, and should probably stick to nn.CrossEntropyLoss. For this criterion, your shapes also seem to be wrong as described in my previous post.

@ptrblck thank you for your response. My confusion roots from the fact that Tensorflow allow us to use softmax in conjunction with BCE loss. Yes, I have 4-class classification problem. I have 1000 batch size and 100 sequence length. And the last dimension corresponds to the multi-class probability. If I use sigmoid I need it only on the third dimension. nn.CrossEntropy won’t be applicable as the dimensions are not right. How should I proceed in this case?

nn.CrossEntropyLoss can be applied if you permute the output to match the expected shapes:

I don’t know, how TF is applying a binary cross-entropy with a softmax activation function, as I assume internally this formula would be used, which involves the sigmoid.