How reduce number of classes in faster rcnn

ayni-af · February 4, 2021, 9:36pm

Hi,

I am new in the field of object detection, I will be grateful if you could help me to reduce the number of detected objects in a pre-trained model that is trained on the coco dataset. I want only to detect “person” and “dog”.

I am using fasterrcnn_resnet50_fpn model:

#load mode
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

I am not sure who should I modify the model to only detect class 1 (person) and class 18 (dog). I do not want to train the model on new data.

I will appreciate it if you can help me with this problem.
Thank you

Dwight_Foster · February 4, 2021, 10:17pm

In the call you can just set num_classes to whatever you want. If you go in the docs here it shows you how to do this as well as other things.

JamesDickens · February 5, 2021, 7:40am

Keep in mind that if you want to use the coco-pretrained backbone and rpn, and then train with a new ROI head (for different class structure), you can simply take the backbone and rpn, and use those to initialize a new faster r-cnn module by passing in the backbone and rpn as arguments to faster-rcnn in torchvision.

ayni-af · February 5, 2021, 8:39am

Hi @Dwight_Foster ,

thank you for your reply. But how can specify that I am only interested in class 1 (person) & class 18 (dog)? I do not want to train again the fasterrcnn_resnet50_fpn model. I want that the model only detects my wished classes and ingnores other 88 classes.

I tried to set the num_classes to 3 but get an error.
My code:

coco_names = ['__background__', 'person', 'dog']

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True, num_classes=len(coco_names))


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.eval().to(device)

path='Downloads/input2/test.jpg'
image = Image.open(path)

boxes, classes, labels = detect_utils.predict(image, model, device, 0.8)
image =detect_utils.draw_boxes(boxes, classes, labels, image)
#cv2.imshow('Image', image)
save_name = f"{path.split('/')[-1].split('.')[0]}"
cv2.imwrite(f"outputs/{save_name}.jpg", image)

the error that I get:

RuntimeError: Error(s) in loading state_dict for FasterRCNN:
	size mismatch for roi_heads.box_predictor.cls_score.weight: copying a param with shape torch.Size([91, 1024]) from checkpoint, the shape in current model is torch.Size([3, 1024]).
	size mismatch for roi_heads.box_predictor.cls_score.bias: copying a param with shape torch.Size([91]) from checkpoint, the shape in current model is torch.Size([3]).
	size mismatch for roi_heads.box_predictor.bbox_pred.weight: copying a param with shape torch.Size([364, 1024]) from checkpoint, the shape in current model is torch.Size([12, 1024]).
	size mismatch for roi_heads.box_predictor.bbox_pred.bias: copying a param with shape torch.Size([364]) from checkpoint, the shape in current model is torch.Size([12]).

ayni-af · February 5, 2021, 8:42am

Hi @JamesDickens ,

Thank you for your answer. I do not want to train the model with a new dataset. I want just that model detects my wishes classes which are available in the coco dataset:

coco_names = ['__background__', 'person', 'dog']

and ignores other classes.
I am tried to set the num_classes to 3 by loading the model it did not work.

Dwight_Foster · February 5, 2021, 2:20pm

Ok the error is because you cannot load in the pretrained model and set num classes to a different value because then the weights won’t match. Or at least it didn’t work for me either. You can try this however:

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.roi_heads.box_predictor.cls_score = nn.Linear(1024,len(coco_names)

that should work.

duddal · July 14, 2021, 8:50am

@Dwight_Foster Hi, I know it’s been some time since this post has been active.
But I tried your method and I have some doubts:

model.roi_heads.box_predictor.cls_score = nn.Linear(1024,len(coco_names). Here we are just telling our model to predict for 3 classes but how does the model know that the classes should be ‘background’, ‘person’ and a ‘dog’?

Dwight_Foster · July 14, 2021, 4:06pm

When you initialize the model it does not know which class is which. That is the whole point of training the model is trying to learn which classes are which.