Calculating the spatial_scale for ROIAlign

Monami_Bhuyan · April 14, 2024, 12:13pm

I want to implement torchvision.ops.RoIAlign to extract the features from a custom CNN model. I need assistance in finding the spatial_scale for ROIAlign.

Specifically, I want to ROI Align on a layer that has a feature map with spatial dimensions of 24 x 24. I already have my bounding box coordinates. However, they are defined on the scale of the original image which is 1080x1920.

Additionally, I am not sure if this information is useful but while training my CNN model, I did resize my images to 224 x 224. However, the bounding box coordinates were identified using a different model for object detection. Thus there is no weight sharing between the two models.

Now I am not sure about the spatial_scale since the height and width of the spatial dimensions for the scale in which the bounding box/ROIs were extracted aren’t equal.
The following code demonstrates what I think I should do. But I am not confident if this is correct:

feature_map_size = 28
original_frame_height = 1080
original_frame_width = 1920
spatial_scale_height = feature_map_size / original_frame_height
spatial_scale_width = feature_map_size / original_frame_width
spatial_scale = min(spatial_scale_height, spatial_scale_width)

Any help/input would be much appreciated. Thanks in advance!