Make a prediction after Object detection FINETUNING

After Fine-tuning faster rcnn as

when want to make a prediction I got the :
Exception has occurred: RuntimeError
cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

Here is the code:
images = [img]
model = get_model_instance_segmentation(2)

pred = model(images) == > here is the error


Are you passing empty images to the model?
Could you check the shape of each image inside the images list?

Thank you for your reply. And I just pass one image at a time and here is an example of the image shape:

torch.Size([3, 3456, 4608])

I need to add a note: that’s only happens when I use model with cpu not cuda.
and by using cuda i got empty boxes list.

That’s a weird issue.
Are you also seeing the same error being raised using this code snippet:

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False)
inputs = [torch.randn([3, 3456, 4608])]
out = model(inputs)

I’m getting an out list of dicts, which contain empty predictions, which is expected for an untrained model and random inputs.

This snippet works fine for both CPU and Cuda and even for the actual image, not the random input.
I thought that the model load step is wrong, but after loading the model and printing it I got the correct architecture like:

  (transform): GeneralizedRCNNTransform(
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
      Resize(min_size=(800,), max_size=1333, mode='bilinear')
  (backbone): BackboneWithFPN(
    (body): IntermediateLayerGetter(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): FrozenBatchNorm2d()
      (relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d()
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d()
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d()
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): FrozenBatchNorm2d()

… etc
I am completely lost !!

Is the only difference between your and my code the model definition now?
If so, could you post it here, so that we could have a look?

While the modules might be equal (shown by print(model)), the forward call might still be different and raise this issue.

For the model definition, I followed the function from the tutorial:

def get_model_instance_segmentation(num_classes):
    # load an instance segmentation model pre-trained pre-trained on COCO
    model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)

    # get number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # replace the pre-trained head with a new one
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

    # now get the number of input features for the mask classifier
    in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
    hidden_layer = 256
    # and replace the mask predictor with a new one
    model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask,

    return model

after training the model I save it using :, 'faster_rcnn_'+str(len(train_dataset))+'.pt')

I need to add another note now: after training the model with less data or less num. of epochs it might work - as I tried to save and use the model after each epoch - in the first epochs it works and outputs a prediction - it’s bad accuracy of course - but after increasing number of epochs or num of dataset its rais the same error

Could you try to save and load the state_dict() instead of the model directly?
I’m unsure, why training the model longer might yield this error, as it seems that the input data changed somehow.