What is the difference between using the cross entropy loss and using log_softmax followed by nll_loss?

(Brando Miranda) #1

I was looking at the MNIST example and it had the line fo code:

return F.log_softmax(x, dim=1)

then later uses:

loss = F.nll_loss(output, target)

what I don’t understand is why does the MNIST example do that instead of just outputting x and the using the torch.nn.CrossEntropy criterion layer?


I think it’s just a matter of taste.
When debugging the model, I like to “see” directly the probabilities of the output classes instead of comparing the logits.
In this example, I prefer to look at the log_prob and the difference between each prediction instead of comparing each row of logits.

logits = Variable(torch.randn(10, 3))
log_prob = F.log_softmax(logits, dim=1)

Just use whatever fits your needs.