In my case I am passing inputs (logits) and targets both have same shape (N, C).
In the example below the cross entropy loss should have been 0
but it isn’t.
torch.nn.CrossEntropyLoss()(torch.tensor([[1,0,0,0,0.0]]), torch.tensor([[1,0,0,0,0.0]]))
>>> tensor(0.9048)
Any possible reason?
This is expected since logits are unbound to [-Inf, +Inf]
. Pass a large positive value and the loss should converge to zero:
torch.nn.CrossEntropyLoss()(torch.tensor([[100.,0,0,0,0.0]]), torch.tensor([[1,0,0,0,0.0]]))
So how should I modify my model or training pipeline so it works correctly and I get a 0 loss as expected?
Thank you, it would be great if you could explain why this happens as per the formula in the docs the value should be 0 even for the values I used in the example.
No, based on the formula in the docs the output is expected since logits are used for x
:
torch.nn.CrossEntropyLoss()(torch.tensor([[1.,0,0,0,0.0]]), torch.tensor([[1,0,0,0,0.0]]))
# tensor(0.9048)
a = torch.tensor([[1.,0,0,0,0.0]])
b = torch.tensor([[1,0,0,0,0.0]])
-1.0 * torch.log(torch.exp(a[0, b.argmax()]) / torch.exp(a).sum())
# tensor(0.9048)