Labels out of range torchvision.models.detection.fasterrcnn_resnet50_fpn

The vision modelzoo Fasterrcnn doc says the following :

During training, the model expects both the input tensors, as well as a targets dictionary, containing:

  • boxes ( Tensor[N, 4] ): the ground-truth boxes in [x0, y0, x1, y1] format, with values between 0 and H and 0 and W

These are my sample inputs <H,W, [x0,y0,x1,y1]>:

124 898 tensor([21.,  0., 88., 46.], device='cuda:0')
124 898 tensor([45., 51., 89., 98.], device='cuda:0')
124 898 tensor([ 50., 132.,  76., 183.], device='cuda:0')
124 898 tensor([ 63., 227.,  63., 349.], device='cuda:0')
124 898 tensor([ 63., 795.,  63., 870.], device='cuda:0')
124 898 tensor([  0., 229.,  50., 261.], device='cuda:0')
124 898 tensor([ 11., 269.,  49., 308.], device='cuda:0')
124 898 tensor([ 16., 314.,  50., 349.], device='cuda:0')
124 898 tensor([ 72., 259., 122., 280.], device='cuda:0')
124 898 tensor([ 72., 290., 123., 324.], device='cuda:0')
124 898 tensor([ 43., 389.,  89., 419.], device='cuda:0')
124 898 tensor([ 43., 426.,  89., 470.], device='cuda:0')
124 898 tensor([ 55., 531.,  71., 584.], device='cuda:0')
124 898 tensor([ 45., 619.,  89., 666.], device='cuda:0')
124 898 tensor([ 50., 700.,  76., 751.], device='cuda:0')
124 898 tensor([  0., 815.,  50., 847.], device='cuda:0')
124 898 tensor([ 72., 798., 123., 827.], device='cuda:0')
124 898 tensor([ 72., 835., 123., 867.], device='cuda:0')
124 898 tensor([ 78., 887.,  89., 897.], device='cuda:0')

still I get the error ;

/opt/conda/conda-bld/pytorch_1556653215914/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [0,0,0] Assertion `t >= 0 && t < n_classes` failed.

File "/anaconda2/envs/py/lib/python3.7/site-packages/torchvision/models/detection/roi_heads.py", line 34, in fastrcnn_loss
sampled_pos_inds_subset = torch.nonzero(labels > 0).squeeze(1)

RuntimeError: copy_if failed to synchronize: device-side assert triggered

Am I missing something ? If it is not the actual arget values which are out of range then what else could cause this error ?

I have a similar issue, I was wondering if you solved this?

The docs have an error the H and W are opposite.
Follow this tutorial : TorchVision Object Detection Finetuning Tutorial — PyTorch Tutorials 2.1.1+cu121 documentation

  • boxes (FloatTensor[N, 4]) : the coordinates of the N bounding boxes in [x0, y0, x1, y1] format, ranging from 0 to W and 0 to H

basically check your data for all bbox annotation conforming to :
0<= x0, x1 <= W
x0 < x1
0<=y0, y1 <= H
y0 < y1

My error was some boxes had x0=x1 / y0=y1
meaning the bbox has 0 height or 0 width like