One side note:
nn.NLLLoss
should be used with nn.LogSoftmax
, not nn.Softmax
directly.
So basically a logic output and nn.CrossEntropyLoss
or a nn.LogSoftmax
output with nn.NLLLoss
yield identical losses:
m = nn.LogSoftmax(dim=1)
criterion1 = nn.CrossEntropyLoss()
criterion2 = nn.NLLLoss()
x = torch.randn(1, 5)
y = torch.empty(1, dtype=torch.long).random_(5)
loss1 = criterion1(x, y)
loss2 = criterion2(m(x), y)
print(loss1)
print(loss2)