Confusing results with cross-entropy loss

niko_e · April 16, 2021, 8:58am

Hello together,
I’m working on a dataset for semantic segmantation. I’m doing some experiments with cross-entropy loss and got some confusing results. I transformed my groundtruth-image to the out-like tensor with the shape:
out = [n, num_class, w, h].

Then I generate my target tensor with this out-tensor:
target = torch.argmax(out, dim=1)

and get tensor with the shape [n, w, h]. Finally, I tried to calculate the cross entropy loss

criterion = nn.CrossEntropyLoss()
loss = criterion(out, tareget)

and I got a loss of 2.2. Shouldn’t the loss be 0? And what if I implement this loss in my segmentation network?

Many thanks in advance!

KFrank · April 16, 2021, 2:06pm

Hi Nikolas!

Without knowing the values in your out tensor, it’s hard to know what
the loss should be.

However, please note that the input passed into CrossEntropyLoss
(your out – the predictions made by your model) are expected to be
logits – that is raw-score predictions that run from -inf to inf.

A target with values of 0.0 and 1.0 corresponds to a middling-weak
prediction, and you need a highly-certain prediction to get a loss of 0.0.

Here is a three-class example (for a single prediction – not an image)
that shows a loss of 0.0 for a “highly-certain” prediction expressed
as logits as expected by CrossEntropyLoss:

>>> torch.__version__
'1.7.1'
>>> out = torch.tensor ([[0.0, 1.0, 0.0]])
>>> out_logits = torch.tensor ([[-1.e6, 1.e6, -1.e6]])
>>> torch.softmax (out_logits, 1)
tensor([[0., 1., 0.]])
>>> target = torch.argmax (out, dim = 1)
>>> target
tensor([1])
>>> torch.nn.CrossEntropyLoss() (out, target)
tensor(0.5514)
>>> torch.nn.CrossEntropyLoss() (out_logits, target)
tensor(0.)

Best.

K. Frank

niko_e · April 19, 2021, 6:50am

Thank you for your answer!
My mistake was treating the output as probabilities, as the mathematical definition of cross entropy requires. I forgot, however, that PyTorch treats them as outputs that don’t need to be summed to 1 and need to be converted to probabilities first using the softmax function.