How to use Mask-RCNN model to extract features of boxes?

I’d like to use torchvision.models.detection.MaskRCNN model to detect objects and extract the corresponding features of bounding boxes of objects?

Anyone has any instruction or idea on how to do it properly?

I think you can use the bounding box coordinates and select the corresponding area of featuremap using those coordinates.

But if I use the bounding box coordinates and select the corresponding area of featuremap by using those coordinates, I would get features of different size for different boxes. What I want is features with same dimension.

I think you need to find a way to handle that. Maybe max pooling, average pooling, summation or sth like that. I’m not sure if resizing all of them to a fixed size would work or not.