When using the LogSoftmax & NLLLoss pair, why doesn’t a “one hot” input of the correct category produce a loss of zero? I suspect I’m missing something.
Variation of the example from the docs for NLLLoss:
m = nn.LogSoftmax(dim=1) loss = nn.NLLLoss() # input is of size N x C = 1 X 3 # Input is a perfectly matching on-hot for category 0 input = torch.tensor([[1, 0, 0]], dtype=torch.float) # We want category 0, so we should be right on target target = torch.tensor() output = loss(m(input), target) output
nn.NLLLossexpects the inputs to be log probabilities
Let’s use his trick to undo the log:
tensor([[0.5761, 0.2119, 0.2119]])
The above is exactly what we’d get if we apply
Log) directly, which is good, but the above don’t seem to be probabilities, at least not those that give us a zero loss.
Let’s try log probabilities directly:
lp = torch.tensor([[1.0, 0, 0]]).log() print(lp) loss(lp, target)
Result is what we’d expect: a loss of zero:
tensor([[0., -inf, -inf]]) tensor(0.)
The above is a simplified version of the MNIST example.
The effect of this behavior is that we get a loss even for matches, which seems to cause weights to grow slowly without bound.
What am I doing wrong? Thanks!