In faster rcnn, RoI’s are in reshaped to
(batch, 256, 7, 7) and then will undergo
2 shared convs for bbox and cls prediction. This will result in shape (batch, 1024) before it outputs the bbox pred
(batch, 4) and cls pred
Is there a way to transform the features from
(batch, 256, 7, 7) to something
(batch, w_pred, h_pred), so that I can get the feature values inside the bounding box?
h_pred are width and height of the predicted bounding boxes.
or is there an easier way to get the feature values from the box predictions?