How to save a model and load to a model with diff. #of output

ms_K · December 19, 2019, 6:10am

Hello.

As a project, we created and trained an Alexnet model to train with CIFAR-10.

Our next project is to conduct transfer learning with the model we trained using CIFAR-10.
Problem is, the dataset we are using for transfer learning requires 100 output nodes.
Therefore, if we define a model with 100 output nodes and try to load parameters from previous model, error occurs as below.

Error(s) in loading state_dict for AlexNet2:
size mismatch for fc.4.weight: copying a param with shape torch.Size([10, 4096]) from checkpoint, the shape in current model is torch.Size([100, 4096]).
size mismatch for fc.4.bias: copying a param with shape torch.Size([10]) from checkpoint, the shape in current model is torch.Size([100]).

How should I deal with this problem?
I’m guessing I should only save and load parameters excluding the last layer but not sure about it.

alex.veuthey · December 19, 2019, 7:13am

Use the following: model.load_state_dict(ckpt['model'], strict=False). This will ignore errors when keys are not matching (i.e. not found in the destination OR the source state dict).

Be careful about using this option, as it might hide other errors! Although I think the new default is printing the non-matching keys in the most recent version(s) of PyTorch…

ms_K · December 19, 2019, 7:33am

THX for the reply!
However, same error occurs although I used strict=False.

PATH = ‘/gdrive/My Drive/AIproject/epoch80.pt’
model.load_state_dict(torch.load(PATH), strict=False)

code below is structure of my Model

AlexNet2(
(layer1): Sequential(
(0): Conv2d(3, 96, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): LeakyReLU(negative_slope=0.01, inplace=True)
(3): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(4): Conv2d(96, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(6): LeakyReLU(negative_slope=0.01, inplace=True)
(7): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(8): Conv2d(256, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): LeakyReLU(negative_slope=0.01, inplace=True)
(10): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): LeakyReLU(negative_slope=0.01, inplace=True)
(12): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): LeakyReLU(negative_slope=0.01, inplace=True)
(14): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(fc): Sequential(
(0): Linear(in_features=2304, out_features=4096, bias=True)
(1): Dropout(p=0.4, inplace=False)
(2): Linear(in_features=4096, out_features=4096, bias=True)
(3): Dropout(p=0.4, inplace=False)
(4): Linear(in_features=4096, out_features=100, bias=True)
)
)

model I trained previously has almost same structure but it had 10 out_features at the last Linear array

alex.veuthey · December 19, 2019, 7:36am

Oh right, this is probably because strict=False only checks for missing keys, not different sizes for the same keys.

What you could do is removing the fc from your destination model, and adding it again after loading the weights…

ms_K · December 19, 2019, 7:41am

Got it I’ll try with your advice.
thx:)

******And it worked!
delete last linear layer => load => add linear layer
You saved my day