Using feature extraction layers from pre-trained FRCNN

Hi all,
I have trained FRCNN using torchvision.models.detection.fasterrcnn_resnet50_fpn and now I want to use it’s feature extraction layers for something else.
To do so I first printed frcnn.modules() and see that the model has 4 major components:
0) GeneralizedRCNNTransform

  1. BackboneWithFPN
  2. RPN
  3. ROI Heads

I tried first to use both first and second modules but ran into an error this is what I tried:

feature_net = nn.Sequential(*list(frcnn.modules()[:2])
a = torch.rand(4,3,256,256)
out = feature_net(a)

and got the following error:

TypeError: conv2d(): argument 'input' (position 1) must be Tensor, not tuple

After looking at GeneralizedRCNNTransform module I have noticed it just holds some metadata about the model, image mean, image std etc.

so I tried to do the same but with the (1) module only (BackboneWithFPN)

feature_net = nn.Sequential(*list(frcnn.modules()[1])

Then passing a worked fine but when trying to print out.shape raised the following error:

AttributeError: 'collections.OrderedDict' object has no attribute 'shape'

out isn’t tensor as expected.
How can it be solved?

Thank you in advance :slight_smile:

The output of the unmodified model is an OrderedDict containing the boxes, labels, and scores.
Could you check your current output for these keys?

Hi Patrick!
I forgot to mention this I already did this check the keys of the output are ‘0’,‘1’,‘2’,‘pool’
Maybe this is related to nested Sequential modules?

Thank you for helping!

It might be related to the wrapping in an nn.Sequential module.
nn.Sequential modules are used for quite simple models and I assume the currently used one might not work out of the box using this approach.
If you want to use some activations, I would recommend to use forward hooks as described here and process these activations further.

I just want to use one specific children of the model (the one which response to feature extraction). I wrote by mistake nn.modules I tried to use nn.children and tried to use the second children in the list.

BTW, this model will be used for inference only W/O any training process.
What is the right way to do so?


Which child module do you want to use?


would return BackboneWithFPN. Do you want to use this module as a standalone model?

Yup :slight_smile:
Shouldn’t I wrap it up with sequential module?

You don’t necessarily need to wrap it in an nn.Sequential module, as it is already a working module.
However, since the FeaturePyramidNetwork is used internally, you will get the OrderedDict as the output as seen in this line of code.
Which means, your previous code would work for this submodule.

Now it make sense!!! each value in the Dict represents the different resolution feature maps!

1 Like

BTW @ptrblck another thing, lets say I want to save this OrderedDict to h5py file. How can I do so? Is there any built in function for that?

I’m not sure, what the easiest way would be to store the output in an HDF5 file.
However, you could just use, 'name.pth') to save it.
Would that work or do you need to use h5py?

I want it to be compressed and fast if I will use dataloader to read these files.

I want to extract the model till the FPN and not the layers after that (pooling and other things), I am doing the following:

all_children = list(model.backbone.children())
second_child = list(all_children[1].children())[0]
new_children = [all_children[0]] + [second_child]

My model is defined as follows:

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained = pretrained)
num_classes = len(set(cls2idx.values())) + 1
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

When I pass images to my model, I get an error saying forward expects one positional argument but 2 were given. I am only doing model(images). What am I doing wrong?

How do I get the feature maps so that I can use them for another application?