How to Replace a layer or Module in a pretrained network?

I am working on squeezenet and I want to replace a layer without changing any dimensions of other layers in the pre-trained network and finetune it.

net = SqueezeNet()
state_dict = torch.load('../pretrainedmodels/squeezenet.pth')
net.load_state_dict(state_dict, strict=True)
print(net)
for name, child in net.named_children():
        for x, y in child.named_children():
            print(name,x)

Output is

SqueezeNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
    (3): Fire(
      (squeeze): Conv2d(64, 16, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace=True)
      (expand1x1): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace=True)
      (expand3x3): Conv2d(16, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace=True)
    )
    (4): Fire(
      (squeeze): Conv2d(128, 16, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace=True)
      (expand1x1): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace=True)
      (expand3x3): Conv2d(16, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace=True)
    )
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
    (6): Fire(
      (squeeze): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace=True)
      (expand1x1): Conv2d(32, 128, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace=True)
      (expand3x3): Conv2d(32, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace=True)
    )
    (7): Fire(
      (squeeze): Conv2d(256, 32, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace=True)
      (expand1x1): Conv2d(32, 128, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace=True)
      (expand3x3): Conv2d(32, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace=True)
    )
    (8): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
    (9): Fire(
      (squeeze): Conv2d(256, 48, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace=True)
      (expand1x1): Conv2d(48, 192, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace=True)
      (expand3x3): Conv2d(48, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace=True)
    )
    (10): Fire(
      (squeeze): Conv2d(384, 48, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace=True)
      (expand1x1): Conv2d(48, 192, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace=True)
      (expand3x3): Conv2d(48, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace=True)
    )
    (11): Fire(
      (squeeze): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace=True)
      (expand1x1): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace=True)
      (expand3x3): Conv2d(64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace=True)
    )
    (12): Fire(
      (squeeze): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace=True)
      (expand1x1): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace=True)
      (expand3x3): Conv2d(64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace=True)
    )
  )
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Conv2d(512, 1000, kernel_size=(1, 1), stride=(1, 1))
    (2): ReLU(inplace=True)
    (3): AdaptiveAvgPool2d(output_size=(1, 1))
  )
)
features 0
features 1
features 2
features 3
features 4
features 5
features 6
features 7
features 8
features 9
features 10
features 11
features 12
classifier 0
classifier 1
classifier 2
classifier 3

I want to change the fourth layer(3) Fire Module by some FireB Module(user defined with same input and output dimensions). How do I do this transformation easily in pretrained network?

3 Likes

Could you try to assign your new layer to the one you would like to replace?

net = models.SqueezeNet()
net.features[3] = nn.Conv2d(96, 128, 1, 1) # Replace this with your custom layer
8 Likes

Thanks @ptrblck. It worked.

1 Like

Hi.
Im trying to change module’s’
I know their relative name (model.layer.1.conv …)
And i have a target module that i want to overwrite to it
And they are saved as dict{name:module}

I know that i can change the model’s module by chagning attribute of it (ie model.layer[1].conv = nn.Conv2d(3,1,1,1))
But by calling getattr won’t to what i want to

names = [‘layer’, 0, ‘conv’]
For name in names:
Try:
Module = model[0]
Except:
Module = getattr(model, name)

The code isn’t complete but you can see that I’m trying to use getattr to get the attribute of the wanted layer and overwrite it with different layer

However, it seems like getattr gives a copy of an object, not the id.
So assigning module = nn.conv2d(3,1,1,1) won’t change the network

Is there any way to do this?
I have several modules to change and I can’t do them all by hand

Help much appreciated! Thx

1 Like

To assign a new module, you could alternatively to the direct assignment use setattr.
Assigning the module to the module, which was returned via getattr won’t work, as you already explained.

1 Like

any update? I have the same issue.

Is it possible to replace one layer with nn.Sequential(...) containing multiple layers?
Example:

net = models.SqueezeNet()
net.features[3] = nn.Sequential(
     nn.Linear(...), 
     nn.ReLU(...), 
     nn.Dropout(...)
)
1 Like

Yes, that is possible and the nn.Sequential container will all its internal layers will be called as the replacement layer.

What’s the best way to accomplish this when iterating over modules?

for module in model.modules():
    classname = module.__class__.__name__
    if 'Linear' in classname:
        module = nn.Sequential(...) # replacing Linear with multiple layers defined in Sequential

Is this the correct approach to modify model on-the-fly?
Edit: Assigning to module as shown didn’t work, probably it creates a copy to iterate over?

1 Like

I think you would have to use setattr using the module names to assign the new nn.Sequential module to the attribute, which was used for the previous linear layer.

1 Like

Hi, I want to add attention to resnet18. I am following this link to add attention, but this has code in Keras and I want to convert it to pytorch. Can someone please help.

from keras.applications import ResNet50
from keras.applications.resnet50 import preprocess_input
from keras import Model, layers
from keras.models import load_model, model_from_json
from keras.layers import GlobalAveragePooling2D, Dropout, Dense, Input

print('Creating model... ', end=' ')

input_tensor = Input(shape=(IMAGE_SIZE,IMAGE_SIZE,3))
conv_base = ResNet50(include_top=False, weights=None, input_tensor=input_tensor)

conv_base.load_weights('../input/resnet50/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5')

# Attention network

#a_map = layers.Conv2D(1024, 1, strides=(1, 1), padding="same", activation='relu')(conv_base.output)

a_map = layers.Conv2D(516, 1, strides=(1, 1), padding="same", activation='relu')(conv_base.output)

#a_map = layers.Conv2D(64, 1, strides=(1, 1), padding="same", activation='relu')(a_map)

a_map = layers.Conv2D(1, 1, strides=(1, 1), padding="same", activation='relu')(a_map)

#a_map = layers.Conv2D(1024, 1, strides=(1, 1), padding="same", activation='relu')(a_map)

a_map = layers.Conv2D(2048, 1, strides=(1, 1), padding="same", activation='sigmoid')(a_map)



res = layers.Multiply()([conv_base.output, a_map])
    
x = GlobalAveragePooling2D()(res)
x = Dropout(0.5)(x)
x = Dense(2048, activation='relu')(x)
x = Dropout(0.5)(x)
predictions = Dense(5, activation='softmax', name='final_output')(x)
model = Model(input_tensor, predictions)

model.summary()

print("Done !")

Can someone help to convert the attention network part. I am unable to map the outputs via Multiply

@ptrblck Can you please help with the above logic.

It seems the “Attention network” part just adds a few conv layers. If so, you can just replace them with nn.Conv2d in PyTorch.
The Multiply layer in Keras seems to:

Layer that multiplies (element-wise) a list of inputs.

so you could either directly multiply activations in the forward method of your model or write a custom nn.Module to do it.

1 Like

Hello @ptrblck

Thank you for your answer!
1- What if you want to do this on a Transformer based text generation model e.g. (Llama or T5) ?
2- If I replaced a specific layer with my custom layer, will the parameters/weights of the older layer be preserved since my custom layer is having the same input and output shapes?

Thanks!

Thanks @ptrblck . It worked!

Just in case people find this useful, you can replace specific layers in a pretrained network with your customed layer iteratively as follow (or modify it according to your need).

def replace_module(module, target_name, new_module):
    for child_name, child_module in module.named_children():
        if target_name in child_name:
            setattr(module, child_name, new_module)
        else:
            replace_module(child_module, target_name, new_module)

# example:
t5_model = T5ForConditionalGeneration.from_pretrained(...)

custome_t5layernorm = CustomeT5LayerNorm(...)

replace_module(t5_model, 'layer_norm', custome_t5layernorm)