Assertion `t >= 0 && t < n_classes` failed error

When I ran CUDA_LAUNCH_BLOCKING=1 python train.py I got the following error,

/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:106: cunn_SpatialClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [342,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:106: cunn_SpatialClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [343,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:106: cunn_SpatialClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [344,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:106: cunn_SpatialClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [345,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:106: cunn_SpatialClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [346,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:106: cunn_SpatialClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [347,0,0] Assertion t >= 0 && t < n_classes failed.
Traceback (most recent call last):
File “train.py”, line 238, in
main()
File “train.py”, line 126, in main
train(net, optimizer)
File “train.py”, line 197, in train
loss1 = criterion_CE(out, torch.squeeze(labels).long())
File “/home/public/software/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 722, in _call_impl
result = self.forward(*input, **kwargs)
File “/home/public/software/anaconda3/lib/python3.8/site-packages/torch/nn/modules/loss.py”, line 947, in forward
return F.cross_entropy(input, target, weight=self.weight,
File “/home/public/software/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py”, line 2422, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File “/home/public/software/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py”, line 2220, in nll_loss
ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: cuda runtime error (710) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:134

It seemed that I should use nn.BCE rather than nn.CrossEntropyLoss here? I found only CrossEntropyLoss support ignore_index for classification while nn.BCE didn’t.

Here’s my loss,

        criterion_CE = nn.CrossEntropyLoss(ignore_index=-1).cuda()
        loss = criterion_CE(out, torch.squeeze(labels).long())

The target tensor for nn.CrossEntropyLoss is expected to contain class indices in the range [0, nb_classes-1], which seems to fail in your script.
Check its values via print(target.min(), target.max()) and make sure they are valid.

After print(target.min(), target.max()) ,I got target.min() == 0 rather than -1 I expected. I set the label pixel value to -1 follow your advice here Got RuntimeError: Boolean value of Tensor with more than one value is ambiguous during training - #2 by ptrblck , but it seemed this did not work as I expected? Could you please give me any clues? Many thanks.

Now I got the point. It’s this line of code lead to such strange behavior.

        labels1 = functional.interpolate(labels, size=24, mode='bilinear')

        print("### labels.long().min()", labels.long().min())
        print("### labels.min()", labels.min())

        print("### labels1.long().min()", labels1.long().min())
        print("### labels1.min()", labels1.min())

I got,

### labels.long().min() tensor(-1, device='cuda:2')
### labels.min() tensor(-1., device='cuda:2')
### labels1.long().min() tensor(0, device='cuda:2')
### labels1.min() tensor(0., device='cuda:2')

What’s wrong with functional.interpolate here? Thanks.

You are interpolating values using the bilinear approach and rounding afterwards, which might change the values. I’m not familiar with your use case, but as previously described, the expected target values are in [0, nb_classes-1] unless you use ignore_index for a specific index value.
Your current code crashes, because the target values are not in this range.

Thanks for your reply. Sorry to disturb you again.For the last question, I had set ignore_index to -1 .

criterion_CE = nn.CrossEntropyLoss(ignore_index=-1).cuda()


labels[(0.3 <= labels) & (labels <= 0.7)] = -1

loss = criterion_CE(out, torch.squeeze(labels).long())

How could I fix this?

I’m not sure what you are trying to fix.
If you are manually setting some target indices to -1, which is invalid in the default setup, you have to use ignore_index=-1, since otherwise the expected error will be raised.