Multi-class cross entropy loss and softmax in pytorch

In this topic ,ptrblck said that a F.softmax function at dim=1 should be added before the nn.CrossEntropyLoss().
In the document(https://pytorch.org/docs/stable/nn.html?highlight=crossentropy#torch.nn.CrossEntropyLoss) , it return nll_loss(log_softmax(input, 1) which return negative log and softmax.
My question is that should I calculated the softmax at dim=1 before the function nn.CrossEntropyLoss which already have a softmax at dim=1 ?

1 Like

No, `F.softmax` should not be added before `nn.CrossEntropyLoss`.
I’ll take a look at the thread and edit the answer if possible, as this might be a careless mistake!
Thanks for pointing this out.

EDIT: Indeed the example code had a `F.softmax` applied on the logits, although not explicitly mentioned.
To sum it up: `nn.CrossEntropyLoss` applies `F.log_softmax` and `nn.NLLLoss` internally on your input, so you should pass the raw logits to it.

12 Likes

What loss function are we supposed to use when we use the `F.softmax` layer?

1 Like

Hi Brando!

If you want to use a cross-entropy-like loss function, you shouldn’t
use a softmax layer because of the well-known problem of increased
risk of overflow.

You should either use `nn.CrossEntropyLoss` (which takes
pre-softmax logits, rather than post-softmax probabilities)
without a softmax-like layer, or use a `nn.LogSoftmax` layer,
and feed the results into `nn.NLLLoss`. (Both of these combine
an implicit softmax with the subsequent log in a way that avoids
the enhanced overflow problem.)

If you are stuck for some reason with your softmax layer, you
should run the probabilities output by softmax through `log()`,
and then feed the log-probabilities to `nn.NLLLoss` (but expect
increased risk of overflow).

(I am not aware of any single pytorch cross-entropy loss function
that takes post-softmax probabilities directly.)

Good luck!

K. Frank

5 Likes

Hi, if softmax is not to be used, how do we get the output as probabilities for a multi-class classification problem? I have explained my problem here. Please take a look at my code and help me out since I am a beginner.

You can just apply it to your output as normal. So
model.eval()
output = net(input)
sm = torch.nn.Softmax()
probabilities = sm(output)
print(probabilities )

2 Likes

Hello,

I have a question about Softmax() and CrossEntropyLoss().

In a multi-classification task, I set dim=1 in Softmax(). I wanna know if I need to set the similar parameter in CrossEntropyLoss(). However, I did not find the similar parameter as dim in softmax().

Thanks!

Hello,

I am new to PyTorch, and I encountered a quesiton about Softmax() and CrossEntropyLoss().

In a multi-classification task, I set dim=1 in Softmax(). I wanna know if I need to set the similar parameter in CrossEntropyLoss(). However, I did not find the similar parameter as dim in softmax().

Thanks!

`nn.CrossEntropyLoss` expects raw logits in the shape `[batch_size, nb_classes, *]` so you should not apply a `softmax` activation on the model output. The class dimension should be in `dim1` in the model output.

1 Like

@ptrblck, suppose I have the output of a neural network to have shape `[1000, 100, 4]`. I have applied `nn.Softmax()` on axis `2` and then have taken `nn.BCELoss` with a target which has same shape and for each row and column index of the target I have a 1d vector (4 length) containing exactly one `1` (it is a one hot vector). Does this setup work or is there a flaw?

`nn.BCELoss` can be applied with `torch.sigmoid` for a multi-label classification. Since you are using `softmax`, I assume you are working on a multi-class classification, and should probably stick to `nn.CrossEntropyLoss`. For this criterion, your shapes also seem to be wrong as described in my previous post.

@ptrblck thank you for your response. My confusion roots from the fact that Tensorflow allow us to use `softmax` in conjunction with `BCE` loss. Yes, I have 4-class classification problem. I have 1000 batch size and 100 sequence length. And the last dimension corresponds to the multi-class probability. If I use `sigmoid` I need it only on the third dimension. `nn.CrossEntropy` won’t be applicable as the dimensions are not right. How should I proceed in this case?

`nn.CrossEntropyLoss` can be applied if you permute the output to match the expected shapes:

I don’t know, how TF is applying a binary cross-entropy with a softmax activation function, as I assume internally this formula would be used, which involves the sigmoid.