Change ResNet50's number of output filters

Hello,

I am trying to use the torchvision’s pretrained ResNet50 model to extract features. The key point is that I want to change the number of output filters.

If we take a look at the torchvision.models.resnet50() we can see that the last part of the network has the following layers:

    (2): Bottleneck(
      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=2048, out_features=1000, bias=True)

I want to get rid of the FC layer and change the number output filters from 2048 to 512 features thus I apply the following code:

self.encoder.layer4[2].conv3 = nn.Conv2d(512,512, kernel_size=(1,1), stride=(1,1), bias=False)
self.encoder.layer4[2].bn3   = nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.encoder.fc = nn.Identity()

It might be worth noting that the resnet50 model has 4 main blocks, named “layerX” and 4 is the last one.

I thought that by having these changes, the output of the network would be a tensor of [bs, 512] but I obtain the following error message:

RuntimeError: The size of tensor a (512) must match the size of tensor b (2048) at non-singleton dimension 1

Reading this previous post, my approach seems correct but something is clearly missing. I tried initializing the avgpool layer (although it has no parameters of its own) but the error remains.

Any suggestions on what to do are highly appreciated.

The error is raised in the skip connection:

    out += identity

RuntimeError: The size of tensor a (512) must match the size of tensor b (2048) at non-singleton dimension 1

as you are manipulating the output layer(s) of the block while the input still uses its original channel dimension.

Oh, of course, it is a Residual Network after all. It seems that I will have to build the network by myself then.

Option B is to just change the last FC to:

self.encoder.fc = nn.Linear(2048, 512)

Thank you very much for your help.