Pytorch 1.8.0 fasterrcnn_resnet50_fpn error

My environment is

  • OS: Ubuntu 18.04
  • GPU: RTX3090
  • CUDA: CUDA11.2
  • Pytorch 1.8.0_with_CUDA11.1 stable
from torchvision.models.detection import fasterrcnn_resnet50_fpn

box_model = fasterrcnn_resnet50_fpn(pretrained=True, progress=False).cuda()
xs = torch.rand(2, 3, 1080, 1920, dtype=torch.float32).cuda()
ys = [
  {
    "labels": torch.tensor([1], dtype=torch.int64).cuda(),
    "boxes": torch.tensor([[956.0000, 316.3117, 1134.0000, 838.8275]], 
                          dtype=torch.float32).cuda(),
  },
  {
    "labels": torch.tensor([1], dtype=torch.int64).cuda(),
    "boxes": torch.tensor([[956.0000, 316.3117, 1134.0000, 838.8275]], 
                          dtype=torch.float32).cuda(),
  },
]

box_model(xs, ys)

It occurs error like this.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-13-7f582a050256> in <module>
----> 1 box_model(xs, ys)

~/anaconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/anaconda3/envs/torch/lib/python3.7/site-packages/torchvision/models/detection/generalized_rcnn.py in forward(self, images, targets)
     95         if isinstance(features, torch.Tensor):
     96             features = OrderedDict([('0', features)])
---> 97         proposals, proposal_losses = self.rpn(images, features, targets)
     98         detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
     99         detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)

~/anaconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/anaconda3/envs/torch/lib/python3.7/site-packages/torchvision/models/detection/rpn.py in forward(self, images, features, targets)
    363             regression_targets = self.box_coder.encode(matched_gt_boxes, anchors)
    364             loss_objectness, loss_rpn_box_reg = self.compute_loss(
--> 365                 objectness, pred_bbox_deltas, labels, regression_targets)
    366             losses = {
    367                 "loss_objectness": loss_objectness,

~/anaconda3/envs/torch/lib/python3.7/site-packages/torchvision/models/detection/rpn.py in compute_loss(self, objectness, pred_bbox_deltas, labels, regression_targets)
    294         """
    295 
--> 296         sampled_pos_inds, sampled_neg_inds = self.fg_bg_sampler(labels)
    297         sampled_pos_inds = torch.where(torch.cat(sampled_pos_inds, dim=0))[0]
    298         sampled_neg_inds = torch.where(torch.cat(sampled_neg_inds, dim=0))[0]

~/anaconda3/envs/torch/lib/python3.7/site-packages/torchvision/models/detection/_utils.py in __call__(self, matched_idxs)
     55             # randomly select positive and negative examples
     56             perm1 = torch.randperm(positive.numel(), device=positive.device)[:num_pos]
---> 57             perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg]
     58 
     59             pos_idx_per_image = positive[perm1]

RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDevice: invalid device ordinal

When I try this code on CPU, it works fine.
After that, I reinstalled Pytorch 1.7.1_with_CUDA11.0 stable, it works fine too.

2 Likes

cc @ptrblck do you know where this could be coming from?

Haven’t seen this error, but let me reproduce it on a 3090.

EDIT: I was able to reproduce it with the 1.8.0+CUDA11.1 conda binaries and will debug it further.
It’s not failing in a source build, so my first guess is to look into CUB/Thrust.

2 Likes

Did you just solve the problem. I faces the same now.

~/torch_env/lib/python3.8/site-packages/torchvision/models/detection/rpn.py in forward(self, images, features, targets)
    362             labels, matched_gt_boxes = self.assign_targets_to_anchors(anchors, targets)
    363             regression_targets = self.box_coder.encode(matched_gt_boxes, anchors)
--> 364             loss_objectness, loss_rpn_box_reg = self.compute_loss(
    365                 objectness, pred_bbox_deltas, labels, regression_targets)
    366             losses = {

~/torch_env/lib/python3.8/site-packages/torchvision/models/detection/rpn.py in compute_loss(self, objectness, pred_bbox_deltas, labels, regression_targets)
    294         """
    295 
--> 296         sampled_pos_inds, sampled_neg_inds = self.fg_bg_sampler(labels)
    297         sampled_pos_inds = torch.where(torch.cat(sampled_pos_inds, dim=0))[0]
    298         sampled_neg_inds = torch.where(torch.cat(sampled_neg_inds, dim=0))[0]

~/torch_env/lib/python3.8/site-packages/torchvision/models/detection/_utils.py in __call__(self, matched_idxs)
     55             # randomly select positive and negative examples
     56             perm1 = torch.randperm(positive.numel(), device=positive.device)[:num_pos]
---> 57             perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg]
     58 
     59             pos_idx_per_image = positive[perm1]

RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDevice: invalid device ordinal

OS: Ubuntu 20.04
GPU: RTX 3080
package: pytorch-1.8.0 with CUDA 11.1

I’m just using Pytorch 1.7.1 now.

Thank you

I got the same issue[Kitsunetic], what should I do?

It should be fixed already in the nightly conda binary and pip wheel.
Could you update and check it, please?

CC @Kitsunetic @namirinz

I have the same issue with

print(torch.__version__) 
1.8.0+cu111

I have a 3090 RTX and torch 1.8 with cuda 11.1 is the only one compatible with Detectron2. Any idea when it will be fixed @ptrblck ?
Thank you

I’m having the same issue with pytorch 1.8.1 and cuda 11.1.
The error is different though:

~/anaconda3/envs/pytorch-1.8.1/lib/python3.8/site-packages/torchvision/models/detection/_utils.py in __call__(self, matched_idxs)
     43         neg_idx = []
     44         for matched_idxs_per_image in matched_idxs:
---> 45             positive = torch.where(matched_idxs_per_image >= 1)[0]
     46             negative = torch.where(matched_idxs_per_image == 0)[0]
     47 

RuntimeError: CUDA error: device-side assert triggered

I works with:

  • pytorch 1.8.1 + cuda 10.2
  • pytorch 1.7.1 + cuda 11.0

Seems like cuda 11.1 is the problem here.
Sorry I couldn’t help more.

1 Like

The radix_sort is already fixed in the nightly release and PyTorch 1.8.1, so you would have to update to one of these versions.

@MCvin your error seems to be different.
Could you rerun the code via CUDA_LAUNCH_BLOCKING=1 python setup.py args and post the complete stack trace (or create a new topic with your error and this information)?

After updating to 1.8.1 I’m not getting the radix_sort error anymore, but I’m getting the same error as @MCvin. I’m using the stable version with CUDA 11.1 on Ubuntu 18.04. Everything works when run on CPU.

  File "~/anaconda3/envs/neo/lib/python3.9/site-packages/pl_bolts/models/detection/faster_rcnn/faster_rcnn_module.py", line 112, in training_step
    loss_dict = self.model(images, targets)
  File "~/anaconda3/envs/neo/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "~/anaconda3/envs/neo/lib/python3.9/site-packages/torchvision/models/detection/generalized_rcnn.py", line 97, in forward
    proposals, proposal_losses = self.rpn(images, features, targets)
  File "~/anaconda3/envs/neo/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "~/anaconda3/envs/neo/lib/python3.9/site-packages/torchvision/models/detection/rpn.py", line 364, in forward
    loss_objectness, loss_rpn_box_reg = self.compute_loss(
  File "~/anaconda3/envs/neo/lib/python3.9/site-packages/torchvision/models/detection/rpn.py", line 296, in compute_loss
    sampled_pos_inds, sampled_neg_inds = self.fg_bg_sampler(labels)
  File "~/anaconda3/envs/neo/lib/python3.9/site-packages/torchvision/models/detection/_utils.py", line 46, in __call__
    positive = torch.where(matched_idxs_per_image >= 1)[0]
RuntimeError: CUDA error: device-side assert triggered

The error follows a long series of messages like:

/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [0,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.

I guess the problem is with the torch.where operation in torchvision/models/detection/_utils.py. I’ve been able to run my program moving the variable matched_idxs_per_image to CPU in the __call__ function of class BalancedPositiveNegativeSampler, like:

for matched_idxs_per_image in matched_idxs:
    matched_idxs_per_image = matched_idxs_per_image.cpu()
    positive = torch.where(matched_idxs_per_image >= 1)[0]
    negative = torch.where(matched_idxs_per_image == 0)[0]

from line 44.

Hope that helps.

detectron2 is also seeing an increasing number of CUDA error reports on CUDA>=11.1 + pytorch 1.8.x + RTX30xx: RuntimeError: CUDA error: device-side assert triggered · Issue #2837 · facebookresearch/detectron2 · GitHub

Root cause seems to be still randperm: CUDA error: device-side assert triggered(torch1.8.1+cuda11.1) · Issue #55027 · pytorch/pytorch · GitHub

I’m facing the same issue on MinokowskiEngine with pytorch 1.8.X + CUDA 11.X. Cuda 11.1 - Coordinate manager · Issue #330 · NVIDIA/MinkowskiEngine (github.com)

Same here, seems to be related to randperm()

Reproducible code:

>>> import torch
>>> device = torch.device("cuda:0")
>>> torch.randperm(29999, device=device)
tensor([13324, 19251, 23333,  ..., 18540, 14502, 26766], device='cuda:0')
>>> torch.randperm(30000, device=device)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDevice: invalid device ordinal

pytorch version: 1.8.0+cu111

Could you update PyTorch to 1.8.1 or the nightly as described in my previous post, please?

Tested on 1.8.1 and nightly with same environment.
At 1.8.1, the error still happened but nightly works fine.
Thank you

Same problem happened when install nightly version by pip wheel, 1.9.0.dev20210428+cu111, python 3.8.8

  File "/opt/conda/envs/fpn/lib/python3.8/site-packages/torchvision/models/detection/rpn.py", line 363, in forward
    loss_objectness, loss_rpn_box_reg = self.compute_loss(
  File "/opt/conda/envs/fpn/lib/python3.8/site-packages/torchvision/models/detection/rpn.py", line 295, in compute_loss
    sampled_pos_inds, sampled_neg_inds = self.fg_bg_sampler(labels)
  File "/opt/conda/envs/fpn/lib/python3.8/site-packages/torchvision/models/detection/_utils.py", line 45, in __call__
    positive = torch.where(matched_idxs_per_image >= 1)[0]
RuntimeError: CUDA error: device-side assert triggered
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:97: operator(): block: [0,0,0], thread: [2,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.

Same code, and problem solved by pytorch 1.7.1 + cuda 11.0, I think this may be the problem of cuda 11.1

Did you check the indices, which create the error?
If so, what are the min and max values of the indices and what is the shape of the indexed tensor?

I had the same error message as @MCvin with pytorch 1.8.1 running faster r-cnn code. I tried a few things to reproduce it. It seems to work fine for small value of n. I tried this while running FRCNN code through vscode debugging. Yet, I was not able to reproduce it outside like @oo_o did.

pytorch_issue

It was fixed by updating with nightly build ‘1.9.0.dev20210502’.

OS: Ubuntu 20.04
GPU: RTX 3090
Pytorch 1.8.1 / Cuda 11.1