When using the LogSoftmax & NLLLoss pair, why doesn’t a “one hot” input of the correct category produce a loss of zero? I suspect I’m missing something.
Variation of the example from the docs for NLLLoss:
m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
# input is of size N x C = 1 X 3
# Input is a perfectly matching on-hot for category 0
input = torch.tensor([[1, 0, 0]], dtype=torch.float)
# We want category 0, so we should be right on target
target = torch.tensor([0])
output = loss(m(input), target)
output
Result: tensor(0.5514)
nn.NLLLossexpects the inputs to be log probabilities
Let’s use his trick to undo the log:
m(input).exp()
Result: tensor([[0.5761, 0.2119, 0.2119]])
The above is exactly what we’d get if we apply Softmax (without Log) directly, which is good, but the above don’t seem to be probabilities, at least not those that give us a zero loss.
Let’s try log probabilities directly:
lp = torch.tensor([[1.0, 0, 0]]).log()
print(lp)
loss(lp, target)
Result is what we’d expect: a loss of zero:
tensor([[0., -inf, -inf]])
tensor(0.)
The above is a simplified version of the MNIST example.
The effect of this behavior is that we get a loss even for matches, which seems to cause weights to grow slowly without bound.
What am I doing wrong? Thanks!