Change ResNet50's number of output filters

CarlosHernandezP · March 16, 2022, 6:04pm

Hello,

I am trying to use the torchvision’s pretrained ResNet50 model to extract features. The key point is that I want to change the number of output filters.

If we take a look at the torchvision.models.resnet50() we can see that the last part of the network has the following layers:

    (2): Bottleneck(
      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=2048, out_features=1000, bias=True)

I want to get rid of the FC layer and change the number output filters from 2048 to 512 features thus I apply the following code:

self.encoder.layer4[2].conv3 = nn.Conv2d(512,512, kernel_size=(1,1), stride=(1,1), bias=False)
self.encoder.layer4[2].bn3   = nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.encoder.fc = nn.Identity()

It might be worth noting that the resnet50 model has 4 main blocks, named “layerX” and 4 is the last one.

I thought that by having these changes, the output of the network would be a tensor of [bs, 512] but I obtain the following error message:

RuntimeError: The size of tensor a (512) must match the size of tensor b (2048) at non-singleton dimension 1

Reading this previous post, my approach seems correct but something is clearly missing. I tried initializing the avgpool layer (although it has no parameters of its own) but the error remains.

Any suggestions on what to do are highly appreciated.

ptrblck · March 16, 2022, 6:45pm

The error is raised in the skip connection:

    out += identity

RuntimeError: The size of tensor a (512) must match the size of tensor b (2048) at non-singleton dimension 1

as you are manipulating the output layer(s) of the block while the input still uses its original channel dimension.

CarlosHernandezP · March 17, 2022, 10:10am

Oh, of course, it is a Residual Network after all. It seems that I will have to build the network by myself then.

Option B is to just change the last FC to:

self.encoder.fc = nn.Linear(2048, 512)

Thank you very much for your help.