When using the LogSoftmax & NLLLoss pair, why doesn’t a “one hot” input of the correct category produce a loss of zero? I suspect I’m missing something.

Variation of the example from the docs for NLLLoss:

```
m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
# input is of size N x C = 1 X 3
# Input is a perfectly matching on-hot for category 0
input = torch.tensor([[1, 0, 0]], dtype=torch.float)
# We want category 0, so we should be right on target
target = torch.tensor([0])
output = loss(m(input), target)
output
```

Result: `tensor(0.5514)`

`nn.NLLLoss`

expects the inputs to be log probabilities

Let’s use his trick to undo the log:

```
m(input).exp()
```

Result: `tensor([[0.5761, 0.2119, 0.2119]])`

The above is exactly what we’d get if we apply `Softmax`

(without `Log`

) directly, which is good, but the above don’t seem to be probabilities, at least not those that give us a zero loss.

Let’s try log probabilities directly:

```
lp = torch.tensor([[1.0, 0, 0]]).log()
print(lp)
loss(lp, target)
```

Result is what we’d expect: a loss of zero:

```
tensor([[0., -inf, -inf]])
tensor(0.)
```

The above is a simplified version of the MNIST example.

The effect of this behavior is that we get a loss even for matches, which seems to cause weights to grow slowly without bound.

What am I doing wrong? Thanks!