Mask_rcnn hyper params

Before I start, thank you to the authors of torchvision and the mask_rcnn tutorial.

I adapted my dataset according to the tutorial at [TorchVision Object Detection Finetuning Tutorial — PyTorch Tutorials 2.1.1+cu121 documentation] and finetuned using the pre-trained model.

model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)

Results are ok (better than I expected) but not great.

I was hoping someone with experience with Faster/Mask RCNN would be able to point me in a proven direction for maximizing the performance of this library to datasets that look different from coco.

Q1/4: Does anyone know how sensitive mask_rcnn is to AnchorGenerator sizes and aspect ratios and bounding box size?

About 5% of my aspect ratios are outsize of 0.5x to 2x.

aspect ratio h/w <      0.0625   : count        0
aspect ratio h/w <      0.125    : count        47
aspect ratio h/w <      0.25     : count        8195
aspect ratio h/w <      0.5      : count        80538
aspect ratio h/w <      1.0      : count        88215
aspect ratio h/w <      2.0      : count        37482
aspect ratio h/w <      4.0      : count        1617
aspect ratio h/w <      8.0      : count        129
aspect ratio h/w <      16.0     : count        4

Also, a fair amount of my bounding box sizes are outsize of (32, 64, 128, 256, 512).

bounding box size       0        to     10       pixels : width count   863
bounding box size       10       to     20       pixels : width count   7880
bounding box size       20       to     30       pixels : width count   8822
bounding box size       30       to     40       pixels : width count   8500
bounding box size       40       to     50       pixels : width count   7270
bounding box size       50       to     60       pixels : width count   10864
bounding box size       60       to     70       pixels : width count   8869
bounding box size       70       to     80       pixels : width count   9206
bounding box size       80       to     90       pixels : width count   9182
bounding box size       90       to     100      pixels : width count   8421
bounding box size       100      to     110      pixels : width count   7880
bounding box size       110      to     120      pixels : width count   7421
bounding box size       120      to     130      pixels : width count   7071
bounding box size       130      to     140      pixels : width count   6535
bounding box size       140      to     150      pixels : width count   6445
bounding box size       150      to     160      pixels : width count   6280
bounding box size       160      to     170      pixels : width count   5882
bounding box size       170      to     180      pixels : width count   5515
bounding box size       180      to     190      pixels : width count   5360
bounding box size       190      to     200      pixels : width count   5053
bounding box size       200      to     210      pixels : width count   4630
bounding box size       210      to     220      pixels : width count   4326
bounding box size       220      to     230      pixels : width count   4115
bounding box size       230      to     240      pixels : width count   4055
bounding box size       240      to     250      pixels : width count   3863
bounding box size       250      to     260      pixels : width count   3598
bounding box size       260      to     270      pixels : width count   3413
bounding box size       270      to     280      pixels : width count   2993
bounding box size       280      to     290      pixels : width count   2747
bounding box size       290      to     300      pixels : width count   2397
bounding box size       300      to     310      pixels : width count   2350
bounding box size       310      to     320      pixels : width count   2208
bounding box size       320      to     330      pixels : width count   2088
bounding box size       330      to     340      pixels : width count   1894
bounding box size       340      to     350      pixels : width count   1795
bounding box size       350      to     360      pixels : width count   1571
bounding box size       360      to     370      pixels : width count   1473
bounding box size       370      to     380      pixels : width count   1367
bounding box size       380      to     390      pixels : width count   1327
bounding box size       390      to     400      pixels : width count   1258
bounding box size       400      to     410      pixels : width count   1184
bounding box size       410      to     420      pixels : width count   1073
bounding box size       420      to     430      pixels : width count   1059
bounding box size       430      to     440      pixels : width count   975
bounding box size       440      to     450      pixels : width count   883
bounding box size       450      to     460      pixels : width count   911
bounding box size       460      to     470      pixels : width count   845
bounding box size       470      to     480      pixels : width count   791
bounding box size       480      to     490      pixels : width count   727
bounding box size       490      to     500      pixels : width count   685
bounding box size       500      to     510      pixels : width count   687
bounding box size       510      to     520      pixels : width count   667
bounding box size       520      to     530      pixels : width count   626
bounding box size       530      to     540      pixels : width count   613
bounding box size       540      to     550      pixels : width count   571
bounding box size       550      to     560      pixels : width count   599
bounding box size       560      to     570      pixels : width count   582
bounding box size       570      to     580      pixels : width count   501
bounding box size       580      to     590      pixels : width count   473
bounding box size       590      to     600      pixels : width count   423
bounding box size       600      to     610      pixels : width count   416
bounding box size       610      to     620      pixels : width count   372
bounding box size       620      to     630      pixels : width count   317
bounding box size       630      to     640      pixels : width count   320
bounding box size       640      to     650      pixels : width count   305
bounding box size       650      to     660      pixels : width count   274
bounding box size       660      to     670      pixels : width count   271
bounding box size       670      to     680      pixels : width count   240
bounding box size       680      to     690      pixels : width count   208
bounding box size       690      to     700      pixels : width count   170
bounding box size       700      to     710      pixels : width count   167
bounding box size       710      to     720      pixels : width count   144
bounding box size       720      to     730      pixels : width count   137
bounding box size       730      to     740      pixels : width count   96
bounding box size       740      to     750      pixels : width count   120
bounding box size       750      to     760      pixels : width count   94
bounding box size       760      to     770      pixels : width count   93
bounding box size       770      to     780      pixels : width count   77
bounding box size       780      to     790      pixels : width count   78
bounding box size       790      to     800      pixels : width count   79
bounding box size       800      to     810      pixels : width count   61
bounding box size       810      to     820      pixels : width count   38
bounding box size       820      to     830      pixels : width count   49
bounding box size       830      to     840      pixels : width count   51
bounding box size       840      to     850      pixels : width count   44
bounding box size       850      to     860      pixels : width count   31
bounding box size       860      to     870      pixels : width count   42
bounding box size       870      to     880      pixels : width count   31
bounding box size       880      to     890      pixels : width count   28
bounding box size       890      to     900      pixels : width count   22
bounding box size       900      to     910      pixels : width count   19
bounding box size       910      to     920      pixels : width count   20
bounding box size       920      to     930      pixels : width count   15
bounding box size       930      to     940      pixels : width count   11
bounding box size       940      to     950      pixels : width count   9
bounding box size       950      to     960      pixels : width count   13
bounding box size       960      to     970      pixels : width count   8
bounding box size       970      to     980      pixels : width count   7
bounding box size       980      to     990      pixels : width count   13
bounding box size       990      to     1000     pixels : width count   7

Hence why I am wondering how sensitive mask/faster_rcnn is to these parameters?

Q 2/4: Is there any way of re-using more than just the pretrained backbone when I create a custom AnchorGenerator for MaskRCNN ?

The fasterrcnn_resnet50_fpn appears to be trained on 26 epochs of coco data set. I suspect this will take a long time on 2 GPUs to retrain on this dataset especially with expanded aspect ratios and sizes.

I also suspect that coco does not have masks in these extended sample sizes and aspect ratios. So I would need to supplement coco with my dataset?

Hence, my question of trying to find a way to leverage the existing pretraining.

Q 3/4: My custom dataset (hence my fine tunning) does not jitter the scale, but only the location of the crop. How sensitive is mask_rcnn to not scale jittering?

Q 4/4: The default class MaskRCNN min_size=800, max_size=1333 is a lot smaller than my images.

Am I correct in assuming the backbone will do fine with bigger images, it is just the faster_rcnn parts that need to be trained from scratch on these bigger images?

My images are 4x to 6x larger in height and width. I suspect the code

transform = GeneralizedRCNNTransform(min_size, max_size, image_mean, image_std)

in the initializer of FasterRCNN (super to MaskRCNN) will make the bounding box size distribution even more problematic. Any opinions on this?

(TorchVision Object Detection Finetuning Tutorial — PyTorch Tutorials 2.1.1+cu121 documentation)
(Models and pre-trained weights — Torchvision 0.16 documentation)

Hi John,

I am currently researching on same issues that you have. I can only comment on a Q4/4, during training mask rcnn will resize all your images to a resolution of minimum 800 and maximum 1333 px. So it doesn’t really matter what resolution your images have.

If you have an answer for the other 3 questions i am also curious to find them out.

I was rather surprised how robust
Mask-RCNN is. Did not tune anything and it worked very well.
So skilled the long tuning process.