Hello,
I am trying to use the torchvision’s pretrained ResNet50 model to extract features. The key point is that I want to change the number of output filters.
If we take a look at the torchvision.models.resnet50()
we can see that the last part of the network has the following layers:
(2): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Linear(in_features=2048, out_features=1000, bias=True)
I want to get rid of the FC layer and change the number output filters from 2048 to 512 features thus I apply the following code:
self.encoder.layer4[2].conv3 = nn.Conv2d(512,512, kernel_size=(1,1), stride=(1,1), bias=False)
self.encoder.layer4[2].bn3 = nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
self.encoder.fc = nn.Identity()
It might be worth noting that the resnet50 model has 4 main blocks, named “layerX” and 4 is the last one.
I thought that by having these changes, the output of the network would be a tensor of [bs, 512]
but I obtain the following error message:
RuntimeError: The size of tensor a (512) must match the size of tensor b (2048) at non-singleton dimension 1
Reading this previous post, my approach seems correct but something is clearly missing. I tried initializing the avgpool layer (although it has no parameters of its own) but the error remains.
Any suggestions on what to do are highly appreciated.