Before I start, thank you to the authors of torchvision and the mask_rcnn tutorial.
I adapted my dataset according to the tutorial at [TorchVision Object Detection Finetuning Tutorial — PyTorch Tutorials 2.1.1+cu121 documentation] and finetuned using the pre-trained model.
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
Results are ok (better than I expected) but not great.
I was hoping someone with experience with Faster/Mask RCNN would be able to point me in a proven direction for maximizing the performance of this library to datasets that look different from coco.
Q1/4: Does anyone know how sensitive mask_rcnn is to AnchorGenerator sizes and aspect ratios and bounding box size?
About 5% of my aspect ratios are outsize of 0.5x to 2x.
aspect ratio h/w < 0.0625 : count 0
aspect ratio h/w < 0.125 : count 47
aspect ratio h/w < 0.25 : count 8195
aspect ratio h/w < 0.5 : count 80538
aspect ratio h/w < 1.0 : count 88215
aspect ratio h/w < 2.0 : count 37482
aspect ratio h/w < 4.0 : count 1617
aspect ratio h/w < 8.0 : count 129
aspect ratio h/w < 16.0 : count 4
Also, a fair amount of my bounding box sizes are outsize of (32, 64, 128, 256, 512).
bounding box size 0 to 10 pixels : width count 863
bounding box size 10 to 20 pixels : width count 7880
bounding box size 20 to 30 pixels : width count 8822
bounding box size 30 to 40 pixels : width count 8500
bounding box size 40 to 50 pixels : width count 7270
bounding box size 50 to 60 pixels : width count 10864
bounding box size 60 to 70 pixels : width count 8869
bounding box size 70 to 80 pixels : width count 9206
bounding box size 80 to 90 pixels : width count 9182
bounding box size 90 to 100 pixels : width count 8421
bounding box size 100 to 110 pixels : width count 7880
bounding box size 110 to 120 pixels : width count 7421
bounding box size 120 to 130 pixels : width count 7071
bounding box size 130 to 140 pixels : width count 6535
bounding box size 140 to 150 pixels : width count 6445
bounding box size 150 to 160 pixels : width count 6280
bounding box size 160 to 170 pixels : width count 5882
bounding box size 170 to 180 pixels : width count 5515
bounding box size 180 to 190 pixels : width count 5360
bounding box size 190 to 200 pixels : width count 5053
bounding box size 200 to 210 pixels : width count 4630
bounding box size 210 to 220 pixels : width count 4326
bounding box size 220 to 230 pixels : width count 4115
bounding box size 230 to 240 pixels : width count 4055
bounding box size 240 to 250 pixels : width count 3863
bounding box size 250 to 260 pixels : width count 3598
bounding box size 260 to 270 pixels : width count 3413
bounding box size 270 to 280 pixels : width count 2993
bounding box size 280 to 290 pixels : width count 2747
bounding box size 290 to 300 pixels : width count 2397
bounding box size 300 to 310 pixels : width count 2350
bounding box size 310 to 320 pixels : width count 2208
bounding box size 320 to 330 pixels : width count 2088
bounding box size 330 to 340 pixels : width count 1894
bounding box size 340 to 350 pixels : width count 1795
bounding box size 350 to 360 pixels : width count 1571
bounding box size 360 to 370 pixels : width count 1473
bounding box size 370 to 380 pixels : width count 1367
bounding box size 380 to 390 pixels : width count 1327
bounding box size 390 to 400 pixels : width count 1258
bounding box size 400 to 410 pixels : width count 1184
bounding box size 410 to 420 pixels : width count 1073
bounding box size 420 to 430 pixels : width count 1059
bounding box size 430 to 440 pixels : width count 975
bounding box size 440 to 450 pixels : width count 883
bounding box size 450 to 460 pixels : width count 911
bounding box size 460 to 470 pixels : width count 845
bounding box size 470 to 480 pixels : width count 791
bounding box size 480 to 490 pixels : width count 727
bounding box size 490 to 500 pixels : width count 685
bounding box size 500 to 510 pixels : width count 687
bounding box size 510 to 520 pixels : width count 667
bounding box size 520 to 530 pixels : width count 626
bounding box size 530 to 540 pixels : width count 613
bounding box size 540 to 550 pixels : width count 571
bounding box size 550 to 560 pixels : width count 599
bounding box size 560 to 570 pixels : width count 582
bounding box size 570 to 580 pixels : width count 501
bounding box size 580 to 590 pixels : width count 473
bounding box size 590 to 600 pixels : width count 423
bounding box size 600 to 610 pixels : width count 416
bounding box size 610 to 620 pixels : width count 372
bounding box size 620 to 630 pixels : width count 317
bounding box size 630 to 640 pixels : width count 320
bounding box size 640 to 650 pixels : width count 305
bounding box size 650 to 660 pixels : width count 274
bounding box size 660 to 670 pixels : width count 271
bounding box size 670 to 680 pixels : width count 240
bounding box size 680 to 690 pixels : width count 208
bounding box size 690 to 700 pixels : width count 170
bounding box size 700 to 710 pixels : width count 167
bounding box size 710 to 720 pixels : width count 144
bounding box size 720 to 730 pixels : width count 137
bounding box size 730 to 740 pixels : width count 96
bounding box size 740 to 750 pixels : width count 120
bounding box size 750 to 760 pixels : width count 94
bounding box size 760 to 770 pixels : width count 93
bounding box size 770 to 780 pixels : width count 77
bounding box size 780 to 790 pixels : width count 78
bounding box size 790 to 800 pixels : width count 79
bounding box size 800 to 810 pixels : width count 61
bounding box size 810 to 820 pixels : width count 38
bounding box size 820 to 830 pixels : width count 49
bounding box size 830 to 840 pixels : width count 51
bounding box size 840 to 850 pixels : width count 44
bounding box size 850 to 860 pixels : width count 31
bounding box size 860 to 870 pixels : width count 42
bounding box size 870 to 880 pixels : width count 31
bounding box size 880 to 890 pixels : width count 28
bounding box size 890 to 900 pixels : width count 22
bounding box size 900 to 910 pixels : width count 19
bounding box size 910 to 920 pixels : width count 20
bounding box size 920 to 930 pixels : width count 15
bounding box size 930 to 940 pixels : width count 11
bounding box size 940 to 950 pixels : width count 9
bounding box size 950 to 960 pixels : width count 13
bounding box size 960 to 970 pixels : width count 8
bounding box size 970 to 980 pixels : width count 7
bounding box size 980 to 990 pixels : width count 13
bounding box size 990 to 1000 pixels : width count 7
Hence why I am wondering how sensitive mask/faster_rcnn is to these parameters?
Q 2/4: Is there any way of re-using more than just the pretrained backbone when I create a custom AnchorGenerator for MaskRCNN ?
The fasterrcnn_resnet50_fpn appears to be trained on 26 epochs of coco data set. I suspect this will take a long time on 2 GPUs to retrain on this dataset especially with expanded aspect ratios and sizes.
I also suspect that coco does not have masks in these extended sample sizes and aspect ratios. So I would need to supplement coco with my dataset?
Hence, my question of trying to find a way to leverage the existing pretraining.
Q 3/4: My custom dataset (hence my fine tunning) does not jitter the scale, but only the location of the crop. How sensitive is mask_rcnn to not scale jittering?
Q 4/4: The default class MaskRCNN min_size=800, max_size=1333 is a lot smaller than my images.
Am I correct in assuming the backbone will do fine with bigger images, it is just the faster_rcnn parts that need to be trained from scratch on these bigger images?
My images are 4x to 6x larger in height and width. I suspect the code
transform = GeneralizedRCNNTransform(min_size, max_size, image_mean, image_std)
in the initializer of FasterRCNN (super to MaskRCNN) will make the bounding box size distribution even more problematic. Any opinions on this?
(TorchVision Object Detection Finetuning Tutorial — PyTorch Tutorials 2.1.1+cu121 documentation)
(Models and pre-trained weights — Torchvision 0.16 documentation)