CUDA assertion error when calculating loss values

I am trying to get the pretrained faster rcnn model to work, but I keep running into this error.

/opt/conda/conda-bld/pytorch_1616554793803/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [0,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1616554793803/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [1,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
.
.
.
/opt/conda/conda-bld/pytorch_1616554793803/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [127,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
Traceback> (most recent call last):
File “faster_r_cnn.py”, line 778, in
main(epochs, num_folds, patience, gpu, desc, model_path, results_path, path_to_fold_files, img_dir)
File “faster_r_cnn.py”, line 705, in main
loss = train_epoch(model, optimizer, train_data_loader, device, epoch, cur_fold, print_freq=1)
File “faster_r_cnn.py”, line 429, in train_epoch
loss_dict = model(images, targets)
File “/home/s2399059/.conda/envs/pytorch_env/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 889, in _call_impl
result = self.forward(*input, **kwargs)
File “/home/s2399059/.conda/envs/pytorch_env/lib/python3.8/site-packages/torchvision/models/detection/generalized_rcnn.py”, line 97, in forward
proposals, proposal_losses = self.rpn(images, features, targets)
File “/home/s2399059/.conda/envs/pytorch_env/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 889, in _call_impl
result = self.forward(*input, **kwargs)
File “/home/s2399059/.conda/envs/pytorch_env/lib/python3.8/site-packages/torchvision/models/detection/rpn.py”, line 364, in forward
loss_objectness, loss_rpn_box_reg = self.compute_loss(
File “/home/s2399059/.conda/envs/pytorch_env/lib/python3.8/site-packages/torchvision/models/detection/rpn.py”, line 296, in compute_loss
sampled_pos_inds, sampled_neg_inds = self.fg_bg_sampler(labels)
File “/home/s2399059/.conda/envs/pytorch_env/lib/python3.8/site-packages/torchvision/models/detection/_utils.py”, line 45, in call
positive = torch.where(matched_idxs_per_image >= 1)[0]
RuntimeError: CUDA error: device-side assert triggered

I can’t figure out why this error is occuring, any thoughts?

This almost always means that the shape of your model output (should be batchx num categories) does not match your value range for targets (should be 0 to num categories - 1).

Best regards

Thomas

Hi Thomas, thank you for your reply.

I don’t quite understand what you mean. Are you talking about the loss dictionary that is returned by the model() call? What do you mean with batchx num categories? How do I check the shape of the output, since the call is canceled before it returns.

The finetuning tutorial mentions that we should use num_categories + 1 (0 reserved for background) as the amount of classes for the model. I have 6 different classes, so I initialize my model with 7 classes. As class 0 is reserved, I have numbered my classes from 1 to 6.

Regards,

Kenneth

Right, as per @ptrblck 's advice, you could try upgrading: