Hello,
I am using the pytorch implementation of Mask R-CNN following the object detection finetuning tutorial. I am trying to finetune it so it would be able to perform instance segmentation on images of nano particles (256x256x1). There are only two classes background + nanoparticle.
The model is performing horrendously - validation mAP for ‘bbox’ around 0.1, and mAP for ‘segm’ around 0.06. Right now I’m trying to pinpoint the exact problem on why it is performing so badly and after running the evaluate method(from their github) I noticed that the “masks” output is always 100 masks, even though some of my images have over 4000 masks, why is this 100 value hard-coded and could it be the culprit behind the bad performance?
Below is how I create the model. I configured the min_size and max_size parameters, because my GPU (RTX 3060) was running out of memory quickly while performing calculations with the default values (800,1333).
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor
model = torchvision.models.detection.maskrcnn_resnet50_fpn_v2(
weights='MaskRCNN_ResNet50_FPN_V2_Weights.COCO_V1',
min_size=(256, ),
max_size=256,
trainable_backbone_layers = 2,)
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes=2)
in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
hidden_layer = 256
### and replace the mask predictor with a new one
model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask,
hidden_layer, 2)
device = 'cuda'
I am not knowledgeable about Mask R-CNN, but my gut reaction is that
it is not well suited to detecting that many objects in an image.
If your objects don’t normally touch / overlap, I would suggest using
semantic segmentation (e.g., U-Net) followed by some post-processing
(e.g., connected components) to identify the instances.
Also, depending on the details of your use case, you might look into
using something like StarDist.
Accumulating evaluation results...
DONE (t=0.64s).
Accumulating evaluation results...
DONE (t=0.62s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.105
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.148
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.134
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.105
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.001
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.012
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.108
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.108
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
IoU metric: segm
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.062
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.137
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.042
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.062
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.001
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.009
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.069
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.069
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
I even evaluated the train set and got nearly identical results to the ones evaluating validation data. Does anyone know what the problem might be? It’s as if it stops learning at 0.105 mAP for IoU 0.50:0.95.