In your example the your output has the same “probability” for all three classes, i.e. the logits have the same value.
Their probability should therefore be approx [0.33, 0.33, 0.33]
.
Since you are using LogSoftmax
we can check, if this is true by calling exp
on it (thus getting rid of the log
):
print(m(input))
> tensor([[-1.0986, -1.0986, -1.0986]], grad_fn=<LogSoftmaxBackward>)
print(m(input).exp())
> tensor([[0.3333, 0.3333, 0.3333]], grad_fn=<ExpBackward>)
You will get the same values every time you pass the same logits into LogSoftmax
.
Now we just have to get the right index using target
, multiply with -1
, and end up with a loss value of 1.0986
.