In this topic ,ptrblck said that a F.softmax function at dim=1 should be added before the nn.CrossEntropyLoss().

In the document(https://pytorch.org/docs/stable/nn.html?highlight=crossentropy#torch.nn.CrossEntropyLoss) , it return nll_loss(log_softmax(input, 1) which return negative log and softmax.

My question is that should I calculated the softmax at dim=1 before the function nn.CrossEntropyLoss which already have a softmax at dim=1 ?

No, `F.softmax`

should not be added before `nn.CrossEntropyLoss`

.

I’ll take a look at the thread and edit the answer if possible, as this might be a careless mistake!

Thanks for pointing this out.

EDIT: Indeed the example code had a `F.softmax`

applied on the logits, although not explicitly mentioned.

To sum it up: `nn.CrossEntropyLoss`

applies `F.log_softmax`

and `nn.NLLLoss`

internally on your input, so you should pass the raw logits to it.

What loss function are we supposed to use when we use the `F.softmax`

layer?

Hi Brando!

If you want to use a cross-entropy-like loss function, you shouldn’t

use a softmax layer because of the well-known problem of increased

risk of overflow.

I gave a few words of explanation about this problem in a reply in

another thread:

You should either use `nn.CrossEntropyLoss`

(which takes

pre-softmax logits, rather than post-softmax probabilities)

without a softmax-like layer, or use a `nn.LogSoftmax`

layer,

and feed the results into `nn.NLLLoss`

. (Both of these combine

an implicit softmax with the subsequent log in a way that avoids

the enhanced overflow problem.)

If you are stuck for some reason with your softmax layer, you

should run the probabilities output by softmax through `log()`

,

and then feed the log-probabilities to `nn.NLLLoss`

(but expect

increased risk of overflow).

(I am not aware of any single pytorch cross-entropy loss function

that takes post-softmax probabilities directly.)

Good luck!

K. Frank

Hi, if softmax is not to be used, how do we get the output as probabilities for a multi-class classification problem? I have explained my problem here. Please take a look at my code and help me out since I am a beginner.

You can just apply it to your output as normal. So

model.eval()

output = net(input)

sm = torch.nn.Softmax()

probabilities = sm(output)

print(probabilities )

Hello,

I have a question about Softmax() and CrossEntropyLoss().

In a multi-classification task, I set dim=1 in Softmax(). **I wanna know if I need to set the similar parameter in CrossEntropyLoss().** However, I did not find the similar parameter as dim in softmax().

Thanks!

Hello,

I am new to PyTorch, and I encountered a quesiton about Softmax() and CrossEntropyLoss().

In a multi-classification task, I set dim=1 in Softmax(). **I wanna know if I need to set the similar parameter in CrossEntropyLoss().** However, I did not find the similar parameter as dim in softmax().

Thanks!

`nn.CrossEntropyLoss`

expects raw logits in the shape `[batch_size, nb_classes, *]`

so you should not apply a `softmax`

activation on the model output. The class dimension should be in `dim1`

in the model output.

@ptrblck, suppose I have the output of a neural network to have shape `[1000, 100, 4]`

. I have applied `nn.Softmax()`

on axis `2`

and then have taken `nn.BCELoss`

with a target which has same shape and for each row and column index of the target I have a 1d vector (4 length) containing exactly one `1`

(it is a one hot vector). Does this setup work or is there a flaw?

`nn.BCELoss`

can be applied with `torch.sigmoid`

for a multi-label classification. Since you are using `softmax`

, I assume you are working on a multi-class classification, and should probably stick to `nn.CrossEntropyLoss`

. For this criterion, your shapes also seem to be wrong as described in my previous post.

@ptrblck thank you for your response. My confusion roots from the fact that Tensorflow allow us to use `softmax`

in conjunction with `BCE`

loss. Yes, I have 4-class classification problem. I have 1000 batch size and 100 sequence length. And the last dimension corresponds to the multi-class probability. If I use `sigmoid`

I need it only on the third dimension. `nn.CrossEntropy`

won’t be applicable as the dimensions are not right. How should I proceed in this case?

`nn.CrossEntropyLoss`

can be applied if you permute the output to match the expected shapes:

I don’t know, how TF is applying a binary cross-entropy with a softmax activation function, as I assume internally this formula would be used, which involves the sigmoid.