Epoch: [0] [ 0/34] eta: 0:00:19 lr: 0.001000 loss: 9.5855 (9.5855) loss_classifier: 0.7124 (0.7124) loss_box_reg: 0.0743 (0.0743) loss_keypoint: 8.0775 (8.0775) loss_objectness: 0.6929 (0.6929) loss_rpn_box_reg: 0.0285 (0.0285) time: 0.5787 data: 0.0540 max mem: 3215
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [24,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [25,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [26,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [27,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [28,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [29,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [30,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [31,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
Traceback (most recent call last):
File “traine.py”, line 155, in
train_one_epoch(model, optimizer, data_loader_train, device, epoch, print_freq=1000)
File “/home/targetdir/keypoint_rcnn_training_pytorch-main/engine.py”, line 31, in train_one_epoch
loss_dict = model(images, targets)
File “/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 889, in _call_impl
result = self.forward(*input, **kwargs)
File “/opt/conda/lib/python3.8/site-packages/torchvision/models/detection/generalized_rcnn.py”, line 97, in forward
proposals, proposal_losses = self.rpn(images, features, targets)
File “/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 889, in _call_impl
result = self.forward(*input, **kwargs)
File “/opt/conda/lib/python3.8/site-packages/torchvision/models/detection/rpn.py”, line 364, in forward
loss_objectness, loss_rpn_box_reg = self.compute_loss(
File “/opt/conda/lib/python3.8/site-packages/torchvision/models/detection/rpn.py”, line 296, in compute_loss
sampled_pos_inds, sampled_neg_inds = self.fg_bg_sampler(labels)
File “/opt/conda/lib/python3.8/site-packages/torchvision/models/detection/_utils.py”, line 60, in call
neg_idx_per_image = negative[perm2]
RuntimeError: CUDA error: device-side assert triggered
Hello everyone,
I am getting the above error for single class object detections and segmentations, I mean where there is only one class of objects to detect and train. The error is occurring in kubeflow notebooks when I train. I already tried the - os.environ[‘CUDA_LAUNCH_BLOCKING’] = “1”
and !export CUDA_LAUNCH_BLOCKING=1.
Can anyone help me with this please?