How to modify a pretrained model

Hey there,
I am working on Bilinear CNN for Image Classification. I am trying to modify the pretrained VGG-Net Classifier and modify the final layers for fine-grained classification. I have designed the code snipper that I want to attach after the final layers of VGG-Net but I don’t know-how. Can anyone please help me with this.

class VggBasedNet_bilinear(nn.Module):
    def __init__(self, originalModel):
        super(VggBasedNet_bilinear, self).__init__()
        # feature extraction from Conv5_3 with relu
        self.features = nn.Sequential(*list(original_vgg16.features)[:-1]) 

        self.classifier = nn.Linear(512 * 512, args.numClasses)

    def forward(self, x):
        # feature extraction from Conv5_3 with relu
        x = self.features(x).view(-1,512,784)
        
        #  outer production of features on each position over height*width; average pooling
        x = torch.matmul(x, x.permute(0,2,1)).view(-1,512*512)/784.0

        # signed sqrt
        x = torch.mul(torch.sign(x),torch.sqrt(torch.abs(x)+1e-12)) 

        # L2 normalization
        x = F.normalize(x, p=2, dim=1)

        # final FC layer
        x = self.classifier(x)

        return x

I want to add the above code snippet to the transfer learning tutorial available on the pytorch website.

Do you get any error using your model or could you clarify the question a bit? :wink:
You could most likely just swap your custom model with the one defined in the transfer learning tutorial.

model_ft = models.vgg16(pretrained=True)

for param in model_ft.parameters():
    param.requires_grad = False

class Vgg_added_features(nn.Module):
    def __init__(self, originalModel):
        super(Vgg_added_features, self).__init__()
        self.features = nn.Sequential(*list(originalModel.features)[:-1])
        self.classifier = nn.Linear(512*512, num_classes)
        #self.avg_pool = nn.AdaptiveAvgPool2d((7,7))
    
    def forward(self, x):
        print(x.shape)
        x = self.features(x).view(-1,512,12*14*14)
        print(x.shape)
        x = torch.matmul(x, x.permute(0,2,1)).view(-1,512*512)/12*14*14.0
        print(x.shape)
        x = torch.mul(torch.sign(x),torch.sqrt(torch.abs(x)+1e-12))
        print(x.shape)
        x = F.normalize(x, p=2, dim=1)
        print(x.shape)
        x = self.classifier(x)
        print(x.shape)
        return x

model = Vgg_added_features(model_ft)
print(model)

Above is the code I am using

Error - ValueError: Expected input batch_size (1) to match target batch_size (12).

Input dim 224

Output for print statements:
torch.Size([12, 3, 224, 224])
torch.Size([1, 512, 2352])
torch.Size([1, 262144])
torch.Size([1, 262144])
torch.Size([1, 262144])
torch.Size([1, 62])

@ptrblck could you please help me out here

self.features would output a tensor of [batch_size, 512, 14, 14].
If you reshape this output via view(-1, 512, 12*14*14), you are interleaving the data.
What’s the use case to put the batch dimension into the features?
This also means that you end up with a batch size of 1, which creates the mentioned error.

hu @ptrblck :slight_smile:

I wasn’t sure if this was the right place to ask or ask a new question but the title seemed directly relevant to what I want (maybe not the details of his question), didn’t want to ask something where a title for what I want already exists.

I also want to modify a pre-trained nets but I just want to change the hyperparms of my net to my different data set (I do not want to transfer or pretrain my net).

e.g. I did this:

def modify_resnet_for_fsl(model, fc_out_features=5):
    for name, module in model.named_modules():
        if type(module) == torch.nn.BatchNorm2d:
            module.track_running_stats = False
    model.fc.out_features = fc_out_features

but when I run a sample image, the output is still 1000 labels except my 5 I put for debugging/unit testing.

model(x).size()
Out[19]: torch.Size([25, 1000]

any advice? is modifying the modules objects (How to get the module names of nn.Sequential) directly not the right thing to do?

ok pytorch doesn’t like what I’m trying to do:

    for name, module in self._modules.items():
RuntimeError: OrderedDict mutated during iteration

but I am ok with mutating it…I am doing this on purpose to build off the resnet models given…

        module.track_running_stats = False
        model.__setattr__(name, module)

how is this supposed to be done properly?


this deosn’t seem to work either as now there are a bunch of extra fields that probably shouldn’t be there…

for name, module in copy.deepcopy(model).named_modules():
    # if type(module) == torch.nn.BatchNorm2d:
    if 'bn' in name:
        # module.track_running_stats = '123'
        #moddule = f'torch.nn.{module}'
        # module = BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
        module.track_running_stats = False
        # model.__setattr__(name, module)
        # eval(f'model.{name} = module')
model.fc = torch.nn.Linear(in_features=512, out_features=fc_out_features, bias=True)

but the above looks wrong, there are so many attributes…and the old ones seem to be there still too!!!


model
Out[80]: 
ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer2): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer3): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=512, out_features=5, bias=True)
  (layer1.0.bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
  (layer1.0.bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
  (layer1.1.bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
  (layer1.1.bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
  (layer2.0.bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
  (layer2.0.bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
  (layer2.1.bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
  (layer2.1.bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
  (layer3.0.bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
  (layer3.0.bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
  (layer3.1.bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
  (layer3.1.bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
  (layer4.0.bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
  (layer4.0.bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
  (layer4.1.bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
  (layer4.1.bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
)

I guess if python was more like C and I could get the original pointer to module to point to the new one, that would work. The issue is that doing:

module =  new_module

has the local variable module point to new_module instead of getting the address of the actual module inside the resnet to point to the new module which is what I am trying to do.

Seems that the “best” idea I have is modify the actual pointer to the object/layer but make sure everything is modified properly. Including meta-data and weights if needed…the issue is idk how to make this 100% robust. I already tried to change tracking_stats to nonesense and it didn’t lead to errors as I wish it would have.


oh well that was close :laughing:

for name, module in model.named_modules():
    # if type(module) == torch.nn.BatchNorm2d:
    if 'bn' in name:
        print(name)
        print(module)
        new_module = eval('torch.nn.'+str(module).replace('track_running_stats=True', 'track_running_stats=False'))
        module.load_state_dict(new_module.state_dict())
model.fc = torch.nn.Linear(in_features=512, out_features=fc_out_features, bias=True)

but it complained:

RuntimeError: Error(s) in loading state_dict for BatchNorm2d:
	Missing key(s) in state_dict: "running_mean", "running_var", "num_batches_tracked". 

but that is ok. It SHOULD be missing…


Ok I found a list of potentially relevant answers:

will go through and report my solution.

Hi,

can you explain again in more detail what you want to do? So you replaced the last layer to get the desired number of out_features and now you want to replace the batchnorm weights/biases, but not the running statistics (running_mean, running_var), with your own or what?

Greetings.

Hi @Caruso,

The simplest way to explain what I want to do is replace the module of the pre-trained model with a different one such that the pre-trained model still works.

For example referencing what I was trying:

  1. Replace the last fully connected net with a fully connected net with different hyper parameters (e.g. 5 labels rather than 10000). [succeeded with model.fc = torch.nn.Linear(in_features=512, out_features=fc_out_features, bias=True)]
  2. Replace every batch norm with a batch norm that does not track running means but instead uses mini-batch statistics. [Failed, idk how to do yet]

For my specific case a copy of the same module with different hyperparams would do (but ideally replacing any module with a different one is what would be more general and robust).

Thanks for taking the time to read my attempt. I will go ahead and read the links I pasted since those seem to be very related to my challenge.

Hi @Brando_Miranda,

I dont know if i can help you, but maybe a look at the documentation for BatchNorm or the Python source code for BatchNorm helps.
In the source code it says:

"""Decide whether the mini-batch stats should be used for normalization rather than the buffers.
    Mini-batch stats are used in training mode, and in eval mode when buffers are None.
"""
if self.training:
    bn_training = True
else:
    bn_training = (self.running_mean is None) and (self.running_var is None)

If you want to run your model in eval mode and the BatchNorm layers of the model you are using are initialized with track_running_stats=True you could try setting track_running_stats=False, self.running_mean=None and self.running_mean=None, but I’m not sure if this would work.

Another option would be initializing the BatchNorm layer new with track_running_stats=False and overriding the Parameters for weight and bias with yours, if affine=True.

I got it working! This works:

def replace_bn(module, name):
    '''
    Recursively put desired batch norm in nn.module module.

    set module = net to start code.
    '''
    # go through all attributes of module nn.module (e.g. network or layer) and put batch norms if present
    for attr_str in dir(module):
        target_attr = getattr(m, attr_str)
        if type(target_attr) == torch.nn.BatchNorm2d:
            print('replaced: ', name, attr_str)
            new_bn = torch.nn.BatchNorm2d(target_attr.num_features, target_attr.eps, target_attr.momentum, target_attr.affine,
                                          track_running_stats=False)
            setattr(module, attr_str, new_bn)

    # iterate through immediate child modules. Note, the recursion is done by our code no need to use named_modules()
    for name, immediate_child_module in module.named_children():
        replace_bn(immediate_child_module, name)

replace_bn(model, 'model')

the crux is that you need to recursively keep changing the layers (mainly because sometimes you will encounter attributes that have modules itself). I think better code than the above would be to add another if statement (after the batch norm) detecting if you have to recurse and recursing if so. The above works to but first changes the batch norm over the outer layer (i.e. the first loop) and then with another loop making sure no other object that should be recursed is missed (and then recursing).

SO: https://stackoverflow.com/questions/58297197/how-to-change-activation-layer-in-pytorch-pretrained-module/64161690#64161690

credits: Replacing convs modules with custom convs, then NotImplementedError

I think you can also do this, but no without guarantee:

def convert_bn(model):
    for module in model.modules():
        if isinstance(module, torch.nn.modules.batchnorm._BatchNorm):
            module.__init__(module.num_features, module.eps,
                            module.momentum, module.affine,
                            track_running_stats=False)
1 Like

Thanks! That looks promising. I will try it out later! Thanks for your time :slight_smile:

here is a general function for replacing any layer

def replace_layers(model, old, new):
    for n, module in model.named_children():
        if len(list(module.children())) > 0:
            ## compound module, go inside it
            replace_layers(module, old, new)
            
        if isinstance(module, old):
            ## simple module
            setattr(model, n, new)

replace_layer(model, nn.ReLU, nn.ReLU6())

I struggled with it for a few days. So, I did some digging & wrote a kaggle notebook explaining how different types of layers / modules are accessed in pytorch.

2 Likes

Asking question out of context @ptrblck , how can I change the require_grad of each layer after getting the self.features ?
I can make requires_grad to False by iterating self.model.parameters() but what in case I extract feature part of VGG16 (to modify it) and later want to change requires_grad. Because I don’t see any option such as self.model.parameters() after extracting self.features.

Assuming model.features is an nn.Module, you can use model.features.parameters() to iterate the parameters of features.

1 Like