Change backbone in MaskRCNN

I have a Mask RCNN using ResNet50, that works fine, except that it very slow and very big.
It runs out of GPU Memory as soon as I set the batch_size to more than 2(!).

Since I do not have 1000 different classes to detect & classify (like ImageNet), but only 50, I was wondering if a smaller backbone would not be a better fit!?
So I want to test different backbones.

In the pedestrian tutorial I find a part that explains how to exchange the backbone:
This sample does the modification for FasterRCNN (doc error?), but it seems works fine for Mask RCNN.
But doesn’t it need a mask_roi_pooler too? If so how?

I tried the same adaptations for AlexNet and VGG16, but I failed.
Could anyone help me to get AlexNet or VGG in there?


Why not just replace resnet 50 layers with 18. Also, try PyTorch mixed-precision, it reduces the memory used a lot, maybe by half.

Yes, I tried it and resnet18 works fine :slight_smile:
And I will try the mixed precision, if I find some howto…

I still would love to experiment with very small or even self made backbones.
Its not only that my use case has 50 instead of 1000 classes, its also that I have very few samples only.
So it is essiential that I reduce the number of weights/parameters to a minimum, to have a fast learning model.

I have tried the 50 classes without detection (just as cutout images) and could train them with AlexNet or VGG10 with 15 samples each class (and a lot of image augmentation) with 80-85% accuracy. And now I would like to do the same with the Mask RCNN around it to detect the objects first.

I had the impression that the backbone can easily be exchanged, but it seems the adaptation is very specific and not obvious. Therefore I need some help here.

There is an official example for PyTorch’s amp.

If I were in your place, I shall stick to the larger model itself. They are have already learned a lot of patterns and even though they take more time per epoch, they only need a few of them to give good results. I would strongly suggest you first try freezing all blocks except the last and finetune on your dataset. That’s the best way forward. This should be much much faster to train too.

Irrespective of number of classes, the models should learn a ton of features and should be able to generalize. I would say only a small portion of the last layers would be focusing on the class level patterns.

I hope this helps.

AMP helped a lot here!
It reduced size and inreased performance.
And since I can now use batch_size 8 it really speeds up everything!
Thanks a lot for this advise! :slight_smile:

Still, if anyone has ever succeded in getting AlexNet or VGG to run inside MaskRCNN, please show me how.
Even if it does not help, it would help me to convince my boss that this is not a solution.

Thanks for the update, great to know.