I was looking at the MNIST example and it had the line fo code:
return F.log_softmax(x, dim=1)
then later uses:
loss = F.nll_loss(output, target)
what I don’t understand is why does the MNIST example do that instead of just outputting
x and the using the
torch.nn.CrossEntropy criterion layer?