NLLLoss vs CrossEntropyLoss

dachosen1 · August 14, 2020, 8:08am

I’m comparing the results of NLLLoss and CrossEntropyLoss and I don’t understand why the loss for NLLLoss is negative compared to CrossEntropyLoss with the same inputs.

import torch.nn as nn
import torch

label = torch.tensor([3, 0, 1, 1, 4])
output = torch.tensor([[0.5073, 0.4838, 0.5053, 0.4839, 0.5183],
        [0.5072, 0.4849, 0.4933, 0.4809, 0.5148],
        [0.5020, 0.4836, 0.5021, 0.4829, 0.5162],
        [0.5023, 0.4801, 0.4994, 0.4805, 0.5174],
        [0.5024, 0.4899, 0.4932, 0.4835, 0.5148]])

criterion = nn.NLLLoss()
loss = criterion(output, label)
loss
tensor(-0.4939)

criterion = nn.CrossEntropyLoss()
loss = criterion(output, label)
loss
tensor(1.6128)

mariosasko · August 14, 2020, 9:54am

CrossEntropyLoss applies LogSoftmax to the output before passing it to NLLLoss.
This snippet shows how to get equal results:

nll_loss = nn.NLLLoss()
log_softmax = nn.LogSoftmax(dim=1)
print(nll_loss(log_softmax(output), label))

cross_entropy_loss = nn.CrossEntropyLoss()
print(cross_entropy_loss(output, label))

dana_bezalel · November 8, 2021, 12:54pm

Hello,
Is there a difference in terms of running time or accuracy of using CrossEntropyLoss vs. LogSoftmax + NLLLoss (on CPU or GPU)?
Which option is considered more conventional / recommended?
Thanks

ptrblck · November 8, 2021, 11:08pm

nn.CrossEntropyLoss uses F.log_softmax and F.nll_loss internally, so there wouldn’t be a difference in using the latter ops explicitly.