Load pre-trained weights to new architecture

Argo · April 19, 2023, 11:19am

Hi! I found several similar topics, but not exactly what I was looking for.
Let’s Assume I have a pre-trained EfficientNetB0. I want to create a new model and tweak architecture a little bit, then I want to load weights from trained model (for every unaltered layer) and randomly init weights for new layers.

import torch
import torch.nn as nn
import torchvision.models as models


def build_model(num_classes=5):
    model = models.efficientnet_b0(
        weights=models.EfficientNet_B0_Weights.DEFAULT,
        )
    model.classifier[1] = nn.Linear(in_features=1280, out_features=num_classes)
    return model


model_path = 'path/model.pt'

new_model = build_model(num_classes=3)

# load old weights to the new model (iterate the named_modules in both models and load the state_dict per layer)
old_model_state_dict = torch.load(model_path)
new_model_state_dict = new_model.state_dict()
for k, v in old_model_state_dict.items():
    if k in new_model_state_dict:
        new_model_state_dict[k] = v


new_model.load_state_dict(new_model_state_dict)

For example in this case I will get an error because I’ve changed number of output channels, but my small checker only will (maybe) work if layer is new, but not changed

So I want to be able to add layers, delete layers and change something in the layer and still be able to use old weights (to every other unchanged layer)

Would be happy to get some recommendations on the best way to do that!

ptrblck · April 19, 2023, 5:28pm

I would recommend to explicitly load the parameters you need and know were not changed. You could also try to use strict=False in the load_state_dict calls, but if you are not checking the returned mismatches your code might easily beak as any small mistake might make this call a no-op.

Argo · May 9, 2023, 5:59pm

I’m not quit getting there. Let me start with a simpler example, where I just want to load a pretrained weights and change number of output neurons:

import torch
import torch.nn as nn
import torchvision.models as models


def build_model(num_classes=5):
    model = models.efficientnet_b0(
        weights=models.EfficientNet_B0_Weights.DEFAULT,
        )

    model.classifier[1] = nn.Linear(in_features=1280, out_features=num_classes)
    return model


model_path = 'path/model.pt'
num_classes = 10

checkpoint = torch.load(model_path)
model = build_model(num_classes)
model.load_state_dict(checkpoint)

I am getting an error because of output size mismatch. What is the right way to deal with it?

size mismatch for classifier.1.bias: copying a param with shape torch.Size([5]) from checkpoint, the shape in current model is torch.Size([10])

ptrblck · May 15, 2023, 5:18am

Either manipulate the model after loading its state_dict or manipulate the checkpoint by replacing the 'classifier.1.bias' (and weight) keys with either the already used ones or new and randomly initialized tensors in the right shape.