Hi Everyone,
I am working on a project involving Mask R-CNN for video frame analysis. The model is trained on a custom dataset and annotation from CVAT. It works fine for detecting bounding boxes, but I face issues with visualising masks. I am not sure whether the model is generating masks correctly.
Relevant Code:
def postprocess_detection(frames, outputs, threshold=0.5):
frames = frames.cpu().numpy().transpose(1, 2, 0)
boxes = outputs['boxes'].cpu().detach().numpy()
labels = outputs['labels'].cpu().detach().numpy()
scores = outputs['scores'].cpu().detach().numpy()
masks = outputs['masks'].cpu().detach().numpy()
indices = scores >= threshold
boxes = boxes[indices]
labels = labels[indices]
scores = scores[indices]
masks = masks[indices]
mask_canvas = np.zeros_like(frames, dtype=np.uint8)
for i, mask in enumerate(masks):
mask = mask[0, :, :]
mask = preprocessing.normalize(mask)
# Apply threshold to binarize mask
mask = (mask > 0.5).astype(np.uint8)
color = [random.randint(0, 255) for _ in range(3)]
for c in range(3):
mask_canvas[:, :, c] = np.where(mask == 1, color[c], mask_canvas[:, :, c])
result_image = cv2.addWeighted(frames, 1, mask_canvas.astype(np.float32), 0.5, 0)
return result_image, mask_canvas, boxes, labels, scores
Issue:
- Bounding Boxes are drawn correctly
- Masks and not visible and I am not sure whether they are being generated correctly
Additional Details:
- The masks are thresholded and resized before blending.
- The output shape and type of the image array are checked and seem to be correct.
- The device used for inference is CUDA.