When using the LogSoftmax & NLLLoss pair, why doesn’t a “one hot” input of the correct category produce a loss of zero? I suspect I’m missing something.
Variation of the example from the docs for NLLLoss:
m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
# input is of size N x C = 1 X 3
# Input is a perfectly matching on-hot for category 0
input = torch.tensor([[1, 0, 0]], dtype=torch.float)
# We want category 0, so we should be right on target
target = torch.tensor([0])
output = loss(m(input), target)
output
Result: tensor(0.5514)
nn.NLLLoss
expects the inputs to be log probabilities
Let’s use his trick to undo the log:
m(input).exp()
Result: tensor([[0.5761, 0.2119, 0.2119]])
The above is exactly what we’d get if we apply Softmax
(without Log
) directly, which is good, but the above don’t seem to be probabilities, at least not those that give us a zero loss.
Let’s try log probabilities directly:
lp = torch.tensor([[1.0, 0, 0]]).log()
print(lp)
loss(lp, target)
Result is what we’d expect: a loss of zero:
tensor([[0., -inf, -inf]])
tensor(0.)
The above is a simplified version of the MNIST example.
The effect of this behavior is that we get a loss even for matches, which seems to cause weights to grow slowly without bound.
What am I doing wrong? Thanks!