classification problem in pytorch with loss function CrossEntropyLoss returns negative output in prediction

jzhu354 · April 21, 2019, 9:40am

I am trying to train and predict SVHN dataset (VGG architecture). I get very high validate/test accuracy by just getting the largest output class. However, the output weights are of large positive and negative numbers. Are they supposed to parsed as exp(output)/sum(exp(output)) to be converted to probability? Thank you!

ptrblck · April 21, 2019, 11:32am

If you are using nn.CrossEntropyLoss, you should directly pass the logits to this loss function, since internally nn.NLLLoss and F.log_softmax will be used.
Your proposed softmax function should not be used for one of these loss functions, but might of course be used for debugging purposes etc. to see the probabilities.

wmpauli · July 7, 2020, 6:28pm

I’m also getting negative nn.CrossEntropyLoss (every couple of epochs). I am evaluating the loss on the output of a Resnet model (i.e. last layer is nn.Linear). What other reasons are there be for negative loss?

wmpauli · July 7, 2020, 10:13pm

It looks like in my case the issue was a corruption of the dataset, so that for some samples the target category was -1. Interestingly, when I try to replicate calling nn.CrossEntropyLoss directly on tensors on the CPU, I get an error message, if one of the targets is -1. When I run this on the GPU, I appear to get negative loss, rather than an error message.

KFrank · July 8, 2020, 2:28am

Hello Muster!

I can reproduce this on pytorch 0.3.0 (except I don’t get a negative
loss).

If I have a target class label that is out of range, I get an error on the
cpu, but not on the gpu.

I’m a little surprised that the gpu lets this error pass, although perhaps
it forgoes checking for it in the interest of efficiency.

@ptrblck Do you think this is an issue or by design?

Here is a test script:

import torch
torch.__version__

predc = torch.autograd.Variable (torch.FloatTensor ([[0.5, -0.1, -0.2],[-0.1, -0.1, 0.4]]))
targc = torch.autograd.Variable (torch.LongTensor ([0, 2]))

predg = predc.cuda()
targg = targc.cuda()

torch.nn.functional.cross_entropy (predg, targg)
torch.nn.functional.cross_entropy (predc, targc)

targg[0] = -1
targc[0] = -1

torch.nn.functional.cross_entropy (predg, targg)
torch.nn.functional.cross_entropy (predc, targc)

Here is the output:

>>> import torch
>>> torch.__version__
'0.3.0b0+591e73e'
>>>
>>> predc = torch.autograd.Variable (torch.FloatTensor ([[0.5, -0.1, -0.2],[-0.1, -0.1, 0.4]]))
>>> targc = torch.autograd.Variable (torch.LongTensor ([0, 2]))
>>>
>>> predg = predc.cuda()
>>> targg = targc.cuda()
>>>
>>> torch.nn.functional.cross_entropy (predg, targg)
Variable containing:
 0.7550
[torch.cuda.FloatTensor of size 1 (GPU 0)]

>>> torch.nn.functional.cross_entropy (predc, targc)
Variable containing:
 0.7550
[torch.FloatTensor of size 1]

>>>
>>> targg[0] = -1
>>> targc[0] = -1
>>>
>>> torch.nn.functional.cross_entropy (predg, targg)
Variable containing:
 0.3972
[torch.cuda.FloatTensor of size 1 (GPU 0)]

>>> torch.nn.functional.cross_entropy (predc, targc)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\LisaBrown\Documents\admin\programs\Miniconda3\lib\site-packages\torch\nn\functional.py", line 1140, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, size_average, ignore_index, reduce)
  File "C:\Users\LisaBrown\Documents\admin\programs\Miniconda3\lib\site-packages\torch\nn\functional.py", line 1049, in nll_loss
    return torch._C._nn.nll_loss(input, target, weight, size_average, ignore_index, reduce)
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed.  at d:\pytorch\pytorch\torch\lib\thnn\generic/ClassNLLCriterion.c:87

Best.

K. Frank

ptrblck · July 8, 2020, 3:19am

This shouldn’t be by design and I get a proper error message in the latest nightly binary:

ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
IndexError: Target -1 is out of bounds.

using your code.

We had a bug in the 1.5.0 release (which should be fixed in 1.5.1), which ignored certain assert statements in CUDA code and thus didn’t raise any proper exceptions.
In other words: if you were passing a target with >=nb_classes, you would most likely get an illegal memory access error instead of a proper error message.
However, passing an index of -1 would probably just use the “reverse indexing” and while this is technically a valid operation, the result is of course wrong and this shouldn’t work.

0.3.0 on the other hand might have had the same issue and I don’t know when the assert statements were added to nll_loss.

@wmpauli Could this be the case for your issue? If you are using 1.5.0, I hightly recommend to update to 1.5.1 or the nightly binary.

wmpauli · July 8, 2020, 2:25pm

Thanks for your responses. I am indeed using:

- torch==1.5.0
- torchvision==0.6.0

Glad I had a sadly small batch size so that I caught these negative values.