Reduce the number of classes in Faster R-CNN

Luppo · January 2, 2020, 2:14pm

Hello everyone.

I am using Faster R-CNN for object detection.
Since I only need it to detect vehicles, I am just filtering out labels of non-vehicle objects, however I would like the network to output scores and bounding boxes for vehicles only.

Mainly, I need to change the number of output features of model.roi_heads.box_predictor.bbox_pred. Currently, it is a linear layer with out_features=364, basically 4 outputs for each of the 91 classes of COCO dataset. However I would like to exploit the feature extraction at this stage to predict other values.

I’ll try to explain better with an example:

Let’s say I am just interested in predicting ‘car’ class

car_label = 3

I extract the 4 rows in the weights matrix to predict the bounding box (since I want to use pre-trained Faster R-CNN)

my_weights = model.roi_heads.box_predictor.bbox_pred.weight.data[4*car_label:4*(car_label+1), :]
my_bias = model.roi_heads.box_predictor.bbox_pred.bias.data[4*car_label:4*(car_label+1)]

I want the bounding box predictor to regress another output (5 outputs instead of 4)

model.roi_heads.box_predictor.bbox_pred= nn.Linear(in_features=1024, out_features=5, bias=True)

But I still want to have the bounding box, exploiting the pre-trained network

model.roi_heads.box_predictor.bbox_pred.weight.data[:4,:] = my_weights
model.roi_heads.box_predictor.bbox_pred.bias.data[:4] = my_bias

This obviously doesn’t work:

RuntimeError: The expanded size of the tensor (1) must match the existing size (2) at non-singleton dimension 1.  Target sizes: [10000, 1].  Tensor sizes: [10000, 2]

But even if I set 4 output features only (out_features=4) an error still exists because it removes any lower score box from the output:

/usr/local/lib/python3.6/dist-packages/torchvision/models/detection/roi_heads.py in postprocess_detections(self, class_logits, box_regression, proposals, image_shapes)
    502             # remove low scoring boxes
    503             inds = torch.nonzero(scores > self.score_thresh).squeeze(1)
--> 504             boxes, scores, labels = boxes[inds], scores[inds], labels[inds]
    505 
    506             # remove empty boxes

IndexError: index is out of bounds for dimension with size 0

Any idea on how to do something like this or how to delete the removal of low scoring boxes (since I actually only need one box only)? I also accept ideas on how to add another linear layer having, as input, the 1024 in_features and providing a readable output, even though I prefer to delete this last large matrix multiplication since I don’t need it (and I tried to delete it following these steps but I had errors on the input of a conv layer).

Thank you.