Reduce the number of classes in Faster R-CNN

Hello everyone.

I am using Faster R-CNN for object detection.
Since I only need it to detect vehicles, I am just filtering out labels of non-vehicle objects, however I would like the network to output scores and bounding boxes for vehicles only.

Mainly, I need to change the number of output features of model.roi_heads.box_predictor.bbox_pred. Currently, it is a linear layer with out_features=364, basically 4 outputs for each of the 91 classes of COCO dataset. However I would like to exploit the feature extraction at this stage to predict other values.

I’ll try to explain better with an example:

  1. Let’s say I am just interested in predicting ‘car’ class
car_label = 3
  1. I extract the 4 rows in the weights matrix to predict the bounding box (since I want to use pre-trained Faster R-CNN)
my_weights =[4*car_label:4*(car_label+1), :]
my_bias =[4*car_label:4*(car_label+1)]
  1. I want the bounding box predictor to regress another output (5 outputs instead of 4)
model.roi_heads.box_predictor.bbox_pred= nn.Linear(in_features=1024, out_features=5, bias=True)
  1. But I still want to have the bounding box, exploiting the pre-trained network[:4,:] = my_weights[:4] = my_bias

This obviously doesn’t work:

RuntimeError: The expanded size of the tensor (1) must match the existing size (2) at non-singleton dimension 1.  Target sizes: [10000, 1].  Tensor sizes: [10000, 2]

But even if I set 4 output features only (out_features=4) an error still exists because it removes any lower score box from the output:

/usr/local/lib/python3.6/dist-packages/torchvision/models/detection/ in postprocess_detections(self, class_logits, box_regression, proposals, image_shapes)
    502             # remove low scoring boxes
    503             inds = torch.nonzero(scores > self.score_thresh).squeeze(1)
--> 504             boxes, scores, labels = boxes[inds], scores[inds], labels[inds]
    506             # remove empty boxes

IndexError: index is out of bounds for dimension with size 0

Any idea on how to do something like this or how to delete the removal of low scoring boxes (since I actually only need one box only)? I also accept ideas on how to add another linear layer having, as input, the 1024 in_features and providing a readable output, even though I prefer to delete this last large matrix multiplication since I don’t need it (and I tried to delete it following these steps but I had errors on the input of a conv layer).

Thank you.