Hi there, apologies if this is a weird question, but I’m not very experienced and haven’t had much luck getting an answer.
I need to make a Faster-RCNN with a resnet101 backbone and no FPN (I plan to deal with scales otherwise) but I’m not entirely sure where I should be taking the feature maps from. I was thinking of using torchvision’s implementation of a Faster-RCNN.
I though of taking them from “conv4” or “conv5”, but I noticed examples that I believe used the result of the adaptive pooling. I imagined this would make it more flexible in terms of input image size, but I don’t know if it would be enough information for the RPN.
I’ve also seen examples written in other frameworks that seem to concatenate the outputs of conv4 and 5 somehow.
And if you want to change inner layers you probably should rewrite RCNN class in torchvision.
About making flexible input image sizes, no need to worry, because there already implemented tranformation layers at the beggining and at the and of faster rcnn model. All you need to do is just set min_size and max_size.
Again, have a look at this link:
Sorry for my english, hope I could help you somehow.