Target 2 is out of bounds | object detection finetuning tutorial FastRCNNPredictor

Hello all,

greetings! I would like to implement just FastRCNNPredictor part in the object detection finetuning tutorial. I made the change in __getitem__ as labels = torch.as_tensor(obj_ids, dtype=torch.int64) as per Mask-RCNN tutorial one-line change suggestion · Issue #960 · pytorch/tutorials · GitHub

After I run the code initially I got error at train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10) as,

CUDA error: device-side assert triggered

… which I overcome by simply changing device from cuda to cpu.
Now I face error at the same line of code as, `

IndexError: Target 2 is out of bounds.

The changes I made in the tutorial code so far are, labels (as I mentioned above), I removed mask part from get_model_instance_segmentation and I also changed all batch_size=1 and num_workers=0.

The traces of the error are as follows,

File "/../", line **, in <module>
    train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
  File "/../", line 31, in train_one_epoch
    loss_dict = model(images, targets)
  File "/../torch/nn/modules/", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/../torchvision/models/detection/", line 99, in forward
    detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
  File "/../torch/nn/modules/", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/../torchvision/models/detection/", line 759, in forward
    loss_classifier, loss_box_reg = fastrcnn_loss(class_logits, box_regression, labels, regression_targets)
  File "/../torchvision/models/detection/", line 32, in fastrcnn_loss
    classification_loss = F.cross_entropy(class_logits, labels)
  File "/../torch/nn/", line 2996, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
IndexError: Target 2 is out of bounds.

I would like to know, did anyone also face the same problem while executing the tutorial. It would be very helpful if someone could direct me to overcome this problem in the same. thank you in advance…

Could you describe these changes a bit more, please?
The error (both on the GPU via the assert and on the CPU via the indexing error) are caused by a target tensor, which contains class indices which are out of bounds.
nn.CrossEntropyLoss expects a model output in the shape [batch_size, nb_classes, *] and a target in the shape [batch_size, *] containing class indices in the range [0, nb_classes-1]. Since your target contains the class index 2 it would mean that your model should at least output logits for 3 classes while it seem to output less logits.

Hello, thank you for quick reply.
The changes I made are as follows,

import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
#from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor

def get_model_instance_segmentation(num_classes):
    # load an instance segmentation model pre-trained on COCO
    #model = torchvision.models.detection.maskrcnn_resnet50_fpn(weights="DEFAULT")
    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights="DEFAULT")

    # get number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # replace the pre-trained head with a new one
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

    # now get the number of input features for the mask classifier
    #in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
    #hidden_layer = 256
    # and replace the mask predictor with a new one
    #model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask, hidden_layer, num_classes)
    return model

So far I tried to make only relevant changes, which I could think of and find suitable. It would be great if someone could point out where in the code the error might come from. thank you in advance…

Your code works fine if the previously mentioned class index ranges are used:

# For training
images, boxes = torch.rand(4, 3, 600, 1200), torch.rand(4, 11, 4)
boxes[:, :, 2:4] = boxes[:, :, 0:2] + boxes[:, :, 2:4]
labels = torch.randint(0, 3, (4, 11))
images = list(image for image in images)
targets = []
for i in range(len(images)):
    d = {}
    d['boxes'] = boxes[i]
    d['labels'] = labels[i]
output = model(images, targets)

Did you check the target class indices and compared them to the logit output shape of your model?