Error 'load_state_dict'

maryam_hayat · August 6, 2020, 11:56am

I am a beginner in deep learning. I am trying to run a GAN (UNet based) model on the data - implemented in pytorch. but when I try to run the code. I get the following error :
**KeyError: ‘generatorX’ **

in the line:

generatorX.load_state_dict( checkpoint[‘generatorX’])

I am using the pretrained model / checkpoint (a file already provided by authors with .pth extension.
I cannot figure out what is ‘generatorX’. When I print the model I can see weights.
Any help will be highly appreciated.

Diego · August 6, 2020, 12:25pm

Looks like the checkpoint file does not have the key “generatorX”. Could you run
print checkpoint.keys() and see what you get?

maryam_hayat · August 6, 2020, 3:06pm

Thanks for the reply. When I print, I get this:

odict_keys([‘conv1.0.weight’, ‘conv1.0.bias’, ‘conv2.0.weight’, ‘conv2.0.bias’, ‘conv3.0.weight’, ‘conv3.0.bias’, ‘conv4.0.weight’, ‘conv4.0.bias’, ‘conv5.0.weight’, ‘conv5.0.bias’, ‘conv51.weight’, ‘conv51.bias’, ‘conv52.weight’, ‘conv52.bias’, ‘conv53.weight’, ‘conv53.bias’, ‘fc.0.weight’, ‘fc.0.bias’, ‘fc.2.weight’, ‘fc.2.bias’, ‘conv6.weight’, ‘conv6.bias’, ‘conv7.weight’, ‘conv7.bias’, ‘dconv1.2.weight’, ‘dconv1.2.bias’, ‘dconv2.2.weight’, ‘dconv2.2.bias’, ‘dconv3.2.weight’, ‘dconv3.2.bias’, ‘dconv4.2.weight’, ‘dconv4.2.bias’, ‘conv8.2.weight’, ‘conv8.2.bias’, ‘conv9.2.weight’, ‘conv9.2.bias’])

Diego · August 6, 2020, 4:10pm

Then checkpoint is already the state_dict so you can just load it directly, like so:

generatorX.load_state_dict(checkpoint)

maryam_hayat · August 7, 2020, 3:58pm

@Diego Thank u very much for the timely response.
When I make the change as u suggested,
generatorX.load_state_dict( checkpoint,strict=False)

I get this error . Although I used the images with same dimensions as used by authors.

RuntimeError: Error(s) in loading state_dict for Generator:
size mismatch for conv8.2.weight: copying a param with shape torch.Size([16, 48, 5, 5]) from checkpoint, the shape in current model is torch.Size([16, 48, 1, 1]).
size mismatch for conv9.2.weight: copying a param with shape torch.Size([3, 16, 5, 5]) from checkpoint, the shape in current model is torch.Size([3, 16, 1, 1]).

So I need to train from scratch . can i use this pretrained model ?

Diego · August 7, 2020, 5:20pm

You are probably defining the wrong model for generatorX since the weights you are trying to load don’t match the model. Is there a different model to try? Or a different set of weights?

maryam_hayat · August 9, 2020, 6:01pm

Thank you very much for reply. I run another file where model is trained from scratch.
I am getting the following errors: 'Caught RuntimeError in replica 0 on device 0’
in the line

fakeEnhanced = generatorX(realInput)

and the following error:
’Sizes of tensors must match except in dimension 2. Got 64 and 33 (The offending index is 0)'
where generator method is defined inside the model.py file in the line:
x6 = self.conv7(torch.cat([x5, x53_temp], dim=1))

I searched to correct the errors. Some people say that maybe there is some issue of DataParallel module and the original code was written to run on multiple GPUs.
Do u have idea if it could be the case

user_123454321 · August 9, 2020, 6:16pm

It seems x5 and x53_temp are not having same dimensions. Oddly, it says the batch dimension is not matching. Can you print the model definition please?

maryam_hayat · August 11, 2020, 12:07pm

@user_123454321
Thanks for reply. I get the following when I print model definition:

==========================================================================================
Layer (type:depth-idx) Output Shape Param #

├─Sequential: 1-1 [-1, 16, 512, 512] –
| └─ReflectionPad2d: 2-1 [-1, 3, 516, 516] –
| └─Conv2d: 2-2 [-1, 16, 512, 512] 1,216
| └─SELU: 2-3 [-1, 16, 512, 512] –
| └─BatchNorm2d: 2-4 [-1, 16, 512, 512] 32
├─Sequential: 1-2 [-1, 32, 256, 256] –
| └─ReflectionPad2d: 2-5 [-1, 16, 516, 516] –
| └─Conv2d: 2-6 [-1, 32, 256, 256] 12,832
| └─SELU: 2-7 [-1, 32, 256, 256] –
| └─BatchNorm2d: 2-8 [-1, 32, 256, 256] 64
├─Sequential: 1-3 [-1, 64, 128, 128] –
| └─ReflectionPad2d: 2-9 [-1, 32, 260, 260] –
| └─Conv2d: 2-10 [-1, 64, 128, 128] 51,264
| └─SELU: 2-11 [-1, 64, 128, 128] –
| └─BatchNorm2d: 2-12 [-1, 64, 128, 128] 128
├─Sequential: 1-4 [-1, 128, 64, 64] –
| └─ReflectionPad2d: 2-13 [-1, 64, 132, 132] –
| └─Conv2d: 2-14 [-1, 128, 64, 64] 204,928
| └─SELU: 2-15 [-1, 128, 64, 64] –
| └─BatchNorm2d: 2-16 [-1, 128, 64, 64] 256
├─Sequential: 1-5 [-1, 128, 32, 32] –
| └─ReflectionPad2d: 2-17 [-1, 128, 68, 68] –
| └─Conv2d: 2-18 [-1, 128, 32, 32] 409,728
| └─SELU: 2-19 [-1, 128, 32, 32] –
| └─BatchNorm2d: 2-20 [-1, 128, 32, 32] 256
├─Sequential: 1-6 [-1, 128, 16, 16] –
| └─ReflectionPad2d: 2-21 [-1, 128, 36, 36] –
| └─Conv2d: 2-22 [-1, 128, 16, 16] 409,728
├─Sequential: 1-7 [-1, 128, 8, 8] –
| └─ReflectionPad2d: 2-23 [-1, 128, 20, 20] –
| └─Conv2d: 2-24 [-1, 128, 8, 8] 409,728
├─Conv2d: 1-8 [-1, 128, 1, 1] 1,048,704
├─Sequential: 1-9 [-1, 128, 1, 1] –
| └─SELU: 2-25 [-1, 128, 1, 1] –
| └─Conv2d: 2-26 [-1, 128, 1, 1] 16,512
├─Sequential: 1-10 [-1, 128, 32, 32] –
| └─Conv2d: 2-27 [-1, 128, 32, 32] 16,512

Total params: 2,581,888
Trainable params: 2,581,888
Non-trainable params: 0
Total mult-adds (G): 3.40

user_123454321 · August 11, 2020, 12:20pm

Thanks for this, although this is informative, I cannot discern the shapes of x5 and x53_temp from this, sorry. I need the class definition (mainly the forward and init functions) of generator maybe…

maryam_hayat · August 12, 2020, 4:45pm

@user_123454321 thank you very much for the fast response. The error occurs in the forward function of Generator actually.
One thing I noticed is that the size of images forward takes is 516x516 (height and width) when I print. Although I explicitly resized the images to 512 x512 and this is the size network accepts . This is the apparent reason of error. I am trying to check in the DataLoader where this dimension mismatch is occurring.
I will post here if there is still tensor size mismatch in the later layers of network.
thanks a lot for your suggestions.

Error 'load_state_dict'

========================================================================================== Layer (type:depth-idx) Output Shape Param #

Total params: 2,581,888 Trainable params: 2,581,888 Non-trainable params: 0 Total mult-adds (G): 3.40

==========================================================================================
Layer (type:depth-idx) Output Shape Param #

Total params: 2,581,888
Trainable params: 2,581,888
Non-trainable params: 0
Total mult-adds (G): 3.40