Add conv layer to pretrained model

With a resnet50 or larger, I am trying to change the fc layer into a conv layer followed by a fc. I need to do this because of hardware constraint (2048 feature vector too large) and resnet34 may be too small to learn.

from torchvision.models import resnet50
model = resnet50(pretrained=True)

....
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)v
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=2048, out_features=1000, bias=True)

that last fc layer should be this

  (heads): EmbeddingHead(
    (pool_layer): GlobalAvgPool(output_size=1)
    (bottleneck): Sequential(
       (0): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
       (1): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
     )
     (fc): Linear(in_features=512, out_features=125, bias=True)
   )

From what I read this should work

self.bottleneck = nn.Sequential(
            nn.Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False),
            nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
        )
self.logits = nn.linear(512, 125)
self.heads = nn.Sequential(*[self.bottleneck, self.logits])
model.fc = self.heads

but i keep getting

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [512, 2048, 1, 1], but got 2-dimensional input of size [8, 2048] instead

The goal is after training I can remove the fc layer at the end and have the 512 feature vectors

The error is raised since the activation is flattened in the original resnet before being passed to self.fc as seen here. Since you expect a 4D tensor you could try to replace the self.avgpool with the original avgpool layer in combination with your additional modules, and replace self.fc with nn.Identity. Since this approach sounds quite brittle, the proper way would be to write a custom model and override the forward with your desired modules.

ahh i see, it’s a reshaping issue. Taking your brittle comment into account , it might be better for me to just change the last block from 2048 to 512

From

    (2): Bottleneck(
      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=2048, out_features=1000, bias=True)

i updated the blocks with

model.layer4[2].conv3 = nn.Conv2d(512, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
model.layer4[2].bn3 = nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
model.fc = nn.Linear(512, 125, bias=True)

the resulting model looks correct

    (2): Bottleneck(
      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=512, out_features=125, bias=True)

however when i try

sample = torch.randn(2,3,224,224)
model(sample)

I get

RuntimeError: The size of tensor a (512) must match the size of tensor b (2048) at non-singleton dimension 1

The error is raised since you changed the last part of the Botleneck layer, without adapting the skip connection, which will now fail with the shape mismatch error in this line of code.