Is there anybody happen this error?

mshmoon · May 3, 2018, 7:46am

/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [1017,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0 main(parser.parse_args())
,0,0], thread: [1018,0,0] Assertion t >= 0 && t < n_classes failed.
File “main1.py”, line 181, in main
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [1019,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [1020,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [1021,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [1022,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [1023,0,0] Assertion t >= 0 && t < n_classes failed.
train(args, model)
File “main1.py”, line 115, in train
epoch_loss.append(loss.data[0])
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCStorage.c:36

ptrblck · May 3, 2018, 12:00pm

Could you check the range of your target? It seems there is a mismatch between your model output and the target.
For C classes your target should be in the range [0, C-1].

mshmoon · May 4, 2018, 12:46pm

Yes , yesteday I found it. Thank for you

mshmoon · May 4, 2018, 12:48pm

But The report error place is : epoch_loss.append(loss.data[0])
There is mistery. Why not is loss=criterion(output,target)

ptrblck · May 4, 2018, 1:29pm

Since CUDA calls are executed asynchronously, you should run your code with

CUDA_LAUNCH_BLOCKING=1 python script.py

This makes sure the right line of code will throw the error message.

mshmoon · May 5, 2018, 11:00am

Thank you for you help

XiaoAHeng · July 16, 2019, 12:23pm

There is only one target to segmentate in my task. I set the channel as 2 for model output,and for 2 classes my ground truth is in the range[0,1].However, there is still an same error.Could you give me a hint?Thanks in advance!

Time:2019-07-16 20:02:15.251895, | Epoch:0, step:0, batch_size:1, loss = 0.49627
Time:2019-07-16 20:02:15.361290, | Epoch:0, step:1, batch_size:1, loss = 0.47159
Time:2019-07-16 20:02:15.434024, | Epoch:0, step:2, batch_size:1, loss = 0.45822
Time:2019-07-16 20:02:15.508932, | Epoch:0, step:3, batch_size:1, loss = 0.43967
Time:2019-07-16 20:02:15.578840, | Epoch:0, step:4, batch_size:1, loss = 0.41975
Time:2019-07-16 20:02:15.653170, | Epoch:0, step:5, batch_size:1, loss = 0.41971
Time:2019-07-16 20:02:15.722897, | Epoch:0, step:6, batch_size:1, loss = 0.40062
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [900,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [901,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [902,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [132,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [133,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [134,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [388,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [389,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [390,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [391,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [643,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [644,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [645,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [646,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [647,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [375,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [118,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [119,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [120,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [121,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [630,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [631,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [632,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [633,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [886,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [887,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [888,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [889,0,0] Assertion `t >= 0 && t < n_classes` failed.
Traceback (most recent call last):
  File "/data/ch/StructSeg 2019/code1/main.py", line 87, in <module>
    main(config)
  File "/data/ch/StructSeg 2019/code1/main.py", line 48, in main
    solver.train()
  File "/data/ch/StructSeg 2019/code1/solver.py", line 150, in train
    epoch_loss += loss.item()
RuntimeError: CUDA error: device-side assert triggered

ptrblck · July 16, 2019, 1:10pm

Could you post the shapes of the model’s output and the target?

This should work, if you provide these values as discrete values in a LongTensor.

XiaoAHeng · July 16, 2019, 1:30pm

[quote=“XiaoAHeng, post:9, topic:17416, full:true”]
ground truth: torch.Size([1, 256, 256]); predicted truth: torch.Size([1, 2, 256, 256])
I have tarnsformed the ground truth into LongTensor by GT = GT.to(self.device, dtype=torch.long). I really don’t know what went wrong

ptrblck · July 16, 2019, 1:30pm

The shapes are alright.
If the target only contains values 0 and 1, it should work:

criterion = nn.CrossEntropyLoss()
output = torch.randn(1, 2, 256, 256)
target = torch.randint(0, 2, (1, 256, 256))
loss = criterion(output ,target)

XiaoAHeng · July 16, 2019, 1:34pm

All right then ,Thank you anyway!

ptrblck · July 16, 2019, 1:36pm

Add a print or assert statement in the training loop and check the min and max values to be >=0 and <=1.

XiaoAHeng · July 16, 2019, 2:00pm

I have done it ,but it doesnt work:joy:

XiaoAHeng · July 16, 2019, 2:42pm

I used cv2.resize() function to rescale ground truth size , the raw GT szie is 512*512.After rescale processing,my ground truth is not in the range[0,1].When I delete the cv2.resize() function ,the error disappeared.I wonder why this is case:joy:

ptrblck · July 16, 2019, 3:02pm

Most likely since cv2.resize uses some interpolation method (default is linear interpolation if I’m not mistaken) other than “nearest neighbors”.
Try to pass cv2.INTER_NEAREST as the method to this function.

XiaoAHeng · July 16, 2019, 3:18pm

Yes, cv2.INTER_NEAREST has work.You are very nice!Thank you once again!

Mudassar_Awan · August 23, 2021, 3:34pm

Hello, I trained on the following classes [0,1,…15] in step 1. Every thing works ok. In step 2 I am training classes [0,16,17…20] on a separate classifier. The issue is let(n_classes) is 6 where as the range is unto 20. I am getting this error : t should be between 0 and n_classes. I can’t change labels (0,16,17,…20 ). Suggest me a solution

ptrblck · August 24, 2021, 3:45am

If class indices [16, 20] are representing different classes than [0, 15], your classifier should output logits for 21 classes ([0, 20]). Otherwise, if the class indices are representing the same class but are for some reason shifted in their value, you should remap them.

Ning_Albert · December 6, 2021, 8:50am

Hello, I meet the same problem. When I augment my training set from 226 into 998, the error occured below.

RuntimeError: CUDA error: device-side assert triggered
/opt/conda/conda-bld/pytorch_1614378073850/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [0,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1614378073850/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [1,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1614378073850/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [4,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1614378073850/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [7,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1614378073850/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [8,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1614378073850/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [11,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1614378073850/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [12,0,0] Assertion `t >= 0 && t < n_classes` failed.

It seems like the error occured when the training set beyond 605. I use nn.CSE(output, label) to compute the loss. Output is in shape of (16, 512) and label is in shape of (16,)

ptrblck · December 6, 2021, 9:05am

I don’t know what 226 and 998 represent, but the error message points to an invalid index in nn.NLLLoss or nn.CrossEntropyLoss (which will call the former).
Make sure your model output is given as [batch_size, nb_classes] and the target as [batch_size] containing values in [0, nb_classes=1] for a multi-class classification use case.