Transfer learning in faster rcnn cutom backbone

Hi everyone,
and thanks in advance for all the support you provide in this channel.

I have rewritten the Bottleneck of torchvision resnet50 using exactly the same layers I can see when I print the resnet50 architecture and I replaced each original bottleneck with mine. The resnet50 is the backbone of faster rcnn I am using for obj detection.

Since the original resnet50 was used “pretrained” I copied pretrained weights (inclusive of bias and running_mean/running_var whenever present), from a pretrained backbone into mine, layer by layer, in the following way:

with torch.no_grad():
dst.conv1.weight.copy_(src.conv1.weight) [example]

What I expected was an identical behaviour during training and similar final results in terms of recall, spec, F1 score but it is not the case. Training proceeds very slowly and the loss decreases less compared to the original resnet50 backbone.

Can you explain what is either missing or wrong out of your experience or even better provide a working snippet of code to safely copy pretrained weights etc?

Thanks a lot for your attention,
Gi

Instead of directly copying the parameters you could try to use dst.layer.load_state_dict(src.layer.state_dict()) to check if any shape mismatches were hidden in your approach. Also, checking the total number of trainable parameters might be a good idea to make sure the models are equal.

Hi Ptrblck and thanks for your reply.

I checked the number of parameters in the original and new faster-rcnn backbone (also using state_dict as you suggested) They have the same number of parameters with only one difference in the requires_grad which is False for the parameters of the first block in the original backbone and True in the new one. But this should be no concern since later in the code I invoke .train() and expect it to set requires_grad to True for both the nets, isn’t it? Indeed I tried to apply the following to reset the requires_grad but with no effect:

def set_parameters(model, requires_grad):
    for p in model.parameters():
      p.requires_grad == requires_grad

Will you be so kind to have a look through my definition for the single block of the backbone?
It seems right to me but of course something may have escaped my mind.
Thanks in advance for your help.


# should behave exactly as original torchvision resnet50 backbone
class Bottleneck_Original(nn.Module):
    def __init__(self, inplanes, planes, stride = (1,1), is_downsample = False):
        super(Bottleneck_Original, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=(1,1), stride=(1,1), bias=False)
        self.bn1 = torchvision.ops.FrozenBatchNorm2d(planes, eps=0)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=(3,3), stride=stride, padding=(1,1), bias=False)
        self.bn2 = torchvision.ops.FrozenBatchNorm2d(planes, eps=0)
        self.conv3 = nn.Conv2d(planes, 4*planes, kernel_size=(1,1), stride=(1,1), bias=False)
        self.bn3 = torchvision.ops.FrozenBatchNorm2d(4*planes, eps=0)
        self.relu = nn.ReLU()

        if is_downsample:
            self.downsample = nn.Sequential(
                nn.Conv2d(inplanes, 4*planes, kernel_size=(1, 1), stride=stride, bias=False),
                torchvision.ops.FrozenBatchNorm2d(4*planes, eps=0)
            )       
        else:
            self.downsample = None
        

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)
        out = self.relu(out)
        
        if self.downsample is not None:
            residual = self.downsample(x)
            
        return out + residual

No, calling train() or eval() on a module will change its behavior, e.g. dropout will be disabled, but won’t (un)freeze the parameters.

Could you post a minimal and executable code snippet including random inputs which would reproduce the numerical mismatch?

Hi ptrblck,
if I manage to prepare a toy example can you share by email an address I can send the sample to. Only if it possible of course. Unluckily, although the code is pretty trivial, I am not currently allowed to massively share until the end of the project I am working at.

Thanks in advance and in any case for your help

No, please don’t send any users your private code if you cannot post it here publicly. Instead of sharing the entire code I would rather recommend trying to slim it down a bit, which could remove the parts you cannot share.

Ok, thanks, I’ll see whether I can share a minimal working snippet of code.