Dimension out of range (expected to be in range of [-1, 0], but got 1) in resize_boxes

I am training an object detection model on my Dataset using Faster RCNN and resnet50 with pre-trained weights. I am primarily following the following link while implementing it, the only difference is that instead of segmentation, I am performing object detection.

However, a few steps after I start training, I get the following error

Epoch: [0]  [   0/4181]  eta: 20:35:42  lr: 0.000010  loss: 3.4489 (3.4489)  loss_classifier: 2.0232 (2.0232)  loss_box_reg: 0.0210 (0.0210)  loss_objectness: 1.0478 (1.0478)  loss_rpn_box_reg: 0.3568 (0.3568)  time: 17.7333  data: 6.1786  max mem: 2559
Traceback (most recent call last):
  File "train.py", line 156, in <module>
    train_one_epoch(model, optimizer, data_loader_train, device, epoch, print_freq=1)
  File "/media/charan/Data/Charan_Data/FLIR_RTFNet/custom/engine.py", line 36, in train_one_epoch
    loss_dict = model(images, targets)
  File "/home/charan/anaconda3/envs/flir_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/charan/anaconda3/envs/flir_env/lib/python3.7/site-packages/torchvision/models/detection/generalized_rcnn.py", line 47, in forward
    images, targets = self.transform(images, targets)
  File "/home/charan/anaconda3/envs/flir_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/charan/anaconda3/envs/flir_env/lib/python3.7/site-packages/torchvision/models/detection/transform.py", line 41, in forward
    image, target = self.resize(image, target)
  File "/home/charan/anaconda3/envs/flir_env/lib/python3.7/site-packages/torchvision/models/detection/transform.py", line 76, in resize
    bbox = resize_boxes(bbox, (h, w), image.shape[-2:])
  File "/home/charan/anaconda3/envs/flir_env/lib/python3.7/site-packages/torchvision/models/detection/transform.py", line 137, in resize_boxes
    xmin, ymin, xmax, ymax = boxes.unbind(1)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

My code to load the data is as follows


class CustomDataset(torch.utils.data.Dataset):
    
    def __init__(self, root_dir,transform=None):
        self.root = root_dir
        self.rgb_imgs = list(sorted(os.listdir(os.path.join(root_dir, "rgb/"))))
        self.annotations = list(sorted(os.listdir(os.path.join(root_dir, "annotations/"))))

        self._classes = ('__background__',  # always index 0
                         'car','person','bicycle','dog','other')

        self._class_to_ind = {'car':'3', 'person':'1', 'bicycle':'2', 'dog':'18','other':'91'}
        self.rtf_net = RTFNet(6)

    def __len__(self):
        return len(self.rgb_imgs)

    def __getitem__(self, idx):
        self.num_classes = 6
        
        img_rgb_path = os.path.join(self.root, "rgb/", self.rgb_imgs[idx])

        img = Image.open(img_rgb_path)
        img = np.array(img)
        img = img.transpose((2, 0, 1))
        img = torch.from_numpy(img)

        filename = os.path.join(self.root,'annotations',self.annotations[idx])
        tree = ET.parse(filename)
        objs = tree.findall('object')

        num_objs = len(objs)
        boxes = np.zeros((num_objs, 4), dtype=np.uint16)
        labels = np.zeros((num_objs), dtype=np.float32)
        seg_areas = np.zeros((num_objs), dtype=np.float32)
        
        boxes = []
        for ix, obj in enumerate(objs):
            bbox = obj.find('bndbox')
            x1 = float(bbox.find('xmin').text)
            y1 = float(bbox.find('ymin').text)
            x2 = float(bbox.find('xmax').text)
            y2 = float(bbox.find('ymax').text)

            cls = self._class_to_ind[obj.find('name').text.lower().strip()]
            boxes.append([x1, y1, x2, y2])
            labels[ix] = cls
            seg_areas[ix] = (x2 - x1 + 1) * (y2 - y1 + 1)
        
        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        seg_areas = torch.as_tensor(seg_areas, dtype=torch.float32)
        labels = torch.as_tensor(labels, dtype=torch.float32)
    
        target =  {'boxes': boxes,
                'labels': labels,
                'seg_areas': seg_areas,
                }

        return img,target

The bounding boxes are in PASCAL VOC format(xmin,ymin,xmax and ymax)
My code to train the data is as follows

num_classes = 6

model = fasterrcnn_resnet50_fpn(pretrained=True)

in_features = model.roi_heads.box_predictor.cls_score.in_features

model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)


dataset_train = CustomDataset('images/train/')
dataset_val = CustomDataset('images/val/')

print('Loading data')

data_loader_train = torch.utils.data.DataLoader(
    dataset_train, batch_size=2, shuffle=True,collate_fn=utils.collate_fn)

data_loader_test = torch.utils.data.DataLoader(
    dataset_val, batch_size=2, shuffle=False,collate_fn=utils.collate_fn)

print('Done loading')

# device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
device = torch.device('cuda')
model.to(device)

# construct an optimizer
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005,
                            momentum=0.9, weight_decay=0.0005)

# and a learning rate scheduler which decreases the learning rate by
# 10x every 3 epochs
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                               step_size=3,
                                               gamma=0.1)

num_epochs = 10


for epoch in range(num_epochs):
    # train for one epoch, printing every 10 iterations
    train_one_epoch(model, optimizer, data_loader_train, device, epoch, print_freq=1)
    # update the learning rate
    lr_scheduler.step()
    # evaluate on the test dataset
    evaluate(model, data_loader_val, device=device)

The train_one_epoch function is at the following link

Can someone please help me out.

boxes is expected to have the shape [batch_size, 4] so that boxes.unbind(1) returns the min and max values for both dimensions.
However, in your code it seems that boxes only contains a single dimension, so you might need to unsqueeze the batch dimension.
Also, which batch size are you currently using?

I was trying with different batch sizes of 1,2 and 4. All of them give the same error at some point of time. Is the batch of training the same as the batch size which was set while loading the data?

Hey @ptrblck, My boxes is of shape [batch_size, 4] where batch_size is 1. But the error still persists. Do you know a work around for this?

That shouldn’t be the case for a tensor of [1, 4] as seen in this small code snippet:

boxes = torch.randint(0, 10, (1, 4))
xmin, ymin, xmax, ymax = boxes.unbind(1)

Could you double check the shape of boxes or post your code so that we could have a look?