Softmax and Cross Entropy vs Log Softmax and NLLLoss

Hi, I am observing some weird behaviour.

I have made a classifier and I have tried two different output and loss combinations ; 1) Softmax and Cross Entropy and 2) Log Softmax and NLLLoss

When I run them both, they will both have an initial loss of 1.38, but the loss of the logsoftmax and nllloss will continue all the way down to 0.25 where as in softmax and crossentropy it will stop around 0.9.

Even weirder is that they both perform the same on the test set.

Am I right in thinking they are roughly similar so shouldn’t the log softmax and nll loss perform better better on the test set given it has a lower loss?

nn.CrossEntropyLoss expects the raw logits as the model outputs, so you would have to remove the softmax for the first case.
You should get identical values afterwards, since nn.CrossEntropyLoss uses F.log_softmax and nn.NLLLoss internally.

2 Likes