Device-side assert triggered with SpatialClassNLLCriterion.cu

Run the code with CUDA_LAUNCH_BLOCKING=1

The Pytorch is 1.0.1

Traceback (most recent call last):
  File "tr.py", line 35, in train
    loss = self.loss(output, target)
  File "lib/python3.6/site-packages/torch/nn/functional.py", line 1970, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "lib/python3.6/site-packages/torch/nn/functional.py", line 1792, in nll_loss
    ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:128

Is the code running fine of the CPU? The error message raised in CPU might give some more information.
The current error might point to target indices outside of the valid range [0, nb_classes-1].

Yes, Cpu is fine. and i will try it recently

/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [5,0,0], thread: [232,0,0] Assertion `t >= 0 && t < n_classes` failed.

This test shows another error

This error still points to an invalid target index.
It’s a bit strange that your code is running fine on the CPU.
However, try to add an assert statement and check each target batch so only contain values in [0, nb_classes-1].

My current error is generated after a period of training, and the time interval of each error is different.

If you are shuffling the data it’s normal, that the erroneous batch is at different iterations.

Thanks, I confirm again

after removed the loss.backward() line,I got this error

/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [5,0,0], thread: [179,0,0] Assertion `t >= 0 && t < n_classes` failed.
Traceback (most recent call last):
  File "tr.py", line 42, in train
    self.writer.add_scalar('loss', loss.item())
RuntimeError: CUDA error: device-side assert triggered

The code is running on GPU

if I use CUDA_LAUNCH_BLOCKING=1,gived:

/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [1,0,0], thread: [350,0,0] Assertion `t >= 0 && t < n_classes` failed.
THCudaCheck FAIL file=/pytorch/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu line=128 error=59 : device-side assert triggered
Traceback (most recent call last):
  File "tr.py", line 36, in train
    loss = self.loss(output, target)
  File "python3.6/site-packages/torch/nn/functional.py", line 1970, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "python3.6/site-packages/torch/nn/functional.py", line 1792, in nll_loss
    ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:128

Any Help?