Mask-rcnn training - all AP and Recall scores in "IoU Metric: segm" remain 0

With torchvision’s pre-trained mask-rcnn model, trying to train on a custom dataset prepared in COCO format.

Using torch/vision/detection/engine’s train_one_epoch and evaluate methods for training and evaluation, respectively.

The loss_mask metric is reducing as can be seen here:

Epoch: [5]  [ 0/20]  eta: 0:00:54  lr: 0.005000  loss: 0.5001 (0.5001)  loss_classifier: 0.2200 (0.2200)  loss_box_reg: 0.2616 (0.2616)  loss_mask: 0.0014 (0.0014)  loss_objectness: 0.0051 (0.0051)  loss_rpn_box_reg: 0.0120 (0.0120)  time: 2.7308  data: 1.2866  max mem: 9887
Epoch: [5]  [10/20]  eta: 0:00:26  lr: 0.005000  loss: 0.4734 (0.4982)  loss_classifier: 0.2055 (0.2208)  loss_box_reg: 0.2515 (0.2595)  loss_mask: 0.0012 (0.0013)  loss_objectness: 0.0038 (0.0054)  loss_rpn_box_reg: 0.0094 (0.0113)  time: 2.6218  data: 1.1780  max mem: 9887
Epoch: [5]  [19/20]  eta: 0:00:02  lr: 0.005000  loss: 0.5162 (0.5406)  loss_classifier: 0.2200 (0.2384)  loss_box_reg: 0.2616 (0.2820)  loss_mask: 0.0014 (0.0013)  loss_objectness: 0.0051 (0.0062)  loss_rpn_box_reg: 0.0120 (0.0127)  time: 2.6099  data: 1.1755  max mem: 9887

But the evaluate output shows absolutely no improvement from zero for IoU segm metric:

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.653
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.843
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.723
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.788
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.325
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.701
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.738
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.739
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.832
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.456
IoU metric: segm
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000

The segm metrics don’t improve even after training 500 epochs.

And, the masks that I get as output after training for 100 or 500 epochs, if I visualize, they are showing a couple of dots here and there.

Can you confirm that your target["masks"] look like:

dtype:  torch.uint8 
shape: torch.Size([2, 480, 320])
min: tensor(0, dtype=torch.uint8) 
max: tensor(1, dtype=torch.uint8)

Where 2 is the number of objects (instance segmentation), 480 and 320 are the height and width of the input image respectively.

If, for some reason, the dimensions in the masks shape are permuted, this behavior arises.