Understanding NLLLoss function

In your example the your output has the same “probability” for all three classes, i.e. the logits have the same value.
Their probability should therefore be approx [0.33, 0.33, 0.33].
Since you are using LogSoftmax we can check, if this is true by calling exp on it (thus getting rid of the log):

print(m(input))
> tensor([[-1.0986, -1.0986, -1.0986]], grad_fn=<LogSoftmaxBackward>)
print(m(input).exp())
> tensor([[0.3333, 0.3333, 0.3333]], grad_fn=<ExpBackward>)

You will get the same values every time you pass the same logits into LogSoftmax.
Now we just have to get the right index using target, multiply with -1, and end up with a loss value of 1.0986.

9 Likes