CrossEntropy with softmax?

I have doubt. I read that CrossEntropy is combination of logsoftmax and nllloss. In my case i want to apply softmax in last layer (not logsoftmax), so which loss function I have to use. Is CrossEntropyloss is good enough.)

Hi Kapil!

You don’t actually want to apply softmax() as the last layer of your
model. You should simply use the output of your last Linear layer
(to be understood as logits), and pass them to CrossEntropyLoss.

If you really need probabilities (rather than logits) for some purpose
(and you probably don’t), you should still use CrossEntropyLoss
as your loss function, passing in logits, and separately generate
the probabilities by applying softmax() to the output of your model.

If you insist on building a model that outputs probabilities by
using softmax() as your last layer, then you shouldn’t use
CrossEntropyLoss as it doesn’t expect probabilities as inputs.
You would have to write your own version of cross-entropy that
does take probabilities (straightforward to do), but this approach
is numerically less stable than passing logits to CrossEntropyLoss
(even thought the two approaches are mathematically equivalent).


K. Frank

1 Like

Actually, I was regenerating the results of one paper, where they have used softmax at the last layer. They did not mention the loss function. So it is kind of mandatory to apply softmax at the last layer. Is there any alternative that does exactly as same as mentioned. or is there any custom implementation of cross-entropy loss (the basic cross-entropy loss -y_i log y_i where is i is true label.)

You can use cross entropy loss from here: neural network - Pytorch doing a cross entropy loss when the predictions already have probabilities - Data Science Stack Exchange

1 Like

Thank You MrPositron.