Unable to get Masks with Mask R-CNN Model during inference

Hi Everyone,
I am working on a project involving Mask R-CNN for video frame analysis. The model is trained on a custom dataset and annotation from CVAT. It works fine for detecting bounding boxes, but I face issues with visualising masks. I am not sure whether the model is generating masks correctly.

Relevant Code:

def postprocess_detection(frames, outputs, threshold=0.5):
    frames = frames.cpu().numpy().transpose(1, 2, 0)

    boxes = outputs['boxes'].cpu().detach().numpy()
    labels = outputs['labels'].cpu().detach().numpy()
    scores = outputs['scores'].cpu().detach().numpy()
    masks = outputs['masks'].cpu().detach().numpy()

    indices = scores >= threshold
    boxes = boxes[indices]
    labels = labels[indices]
    scores = scores[indices]
    masks = masks[indices]

    mask_canvas = np.zeros_like(frames, dtype=np.uint8)

    for i, mask in enumerate(masks):
        mask = mask[0, :, :]
        
        mask = preprocessing.normalize(mask)
        
        # Apply threshold to binarize mask
        mask = (mask > 0.5).astype(np.uint8)
        
        color = [random.randint(0, 255) for _ in range(3)]
        for c in range(3):
            mask_canvas[:, :, c] = np.where(mask == 1, color[c], mask_canvas[:, :, c])
    result_image = cv2.addWeighted(frames, 1, mask_canvas.astype(np.float32), 0.5, 0)

    return result_image, mask_canvas, boxes, labels, scores

Issue:

  • Bounding Boxes are drawn correctly
  • Masks and not visible and I am not sure whether they are being generated correctly

Additional Details:

  • The masks are thresholded and resized before blending.
  • The output shape and type of the image array are checked and seem to be correct.
  • The device used for inference is CUDA.