What is the difference between using the cross entropy loss and using log_softmax followed by nll_loss?

I think it’s just a matter of taste.
When debugging the model, I like to “see” directly the probabilities of the output classes instead of comparing the logits.
In this example, I prefer to look at the log_prob and the difference between each prediction instead of comparing each row of logits.

logits = Variable(torch.randn(10, 3))
log_prob = F.log_softmax(logits, dim=1)

Just use whatever fits your needs.