Torch has not attribute load_state_dict?

Hi ptrblck

I saved my trained Nets on GPU and now wants to use them on CPU.

I read your comments but still have same problem as (AttributeError: ‘list’ object has no attribute ‘load_state_dict’
My code is:

            checkpoint = torch.load(Path1,map_location=torch.device('cpu'))

model.load_state_dict(torch.load(Path1,map_location=torch.device(‘cpu’))[‘model_state_dict’])

            model.load_state_dict(torch.load(Path1)['model_state_dict'])
            optimizer.load_state_dict(torch.load(Path1,map_location=torch.device('cpu'))['optimizer_state_dict'])

How did you create your model instance?
Based on the error message it seems it’s a list instead of a subclass of nn.Module.

Hi , Many thnaks fo ryour reply. muy model is :slight_smile:
import torch
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import TensorDataset, DataLoader
import torch.optim as optim
import torch.nn as nn
from torch.utils.data.dataset import random_split
from torch.nn import functional as F
import matplotlib.pyplot as plt
from torch.autograd import Variable

class ConvNet(nn.Module):
def init(self,numf1,numf2,fz1,fz2,nn2,nn3):
super(ConvNet, self).init()
self.numf1=numf1
self.numf2=numf2
self.fz1=fz1
self.fz2=fz2
self.nn2=nn2
self.nn3=nn3
self.layer1 = nn.Sequential(nn.Conv3d(1, self.numf1, kernel_size=self.fz1, stride=1, padding=2),nn.ReLU(),nn.MaxPool3d(kernel_size=2, stride=2))
self.layer2 = nn.Sequential(nn.Conv3d(32,self.numf2, kernel_size=self.fz2, stride=1, padding=2),nn.ReLU(),nn.MaxPool3d(kernel_size=2, stride=2))
self.fc1 = nn.Linear(3072, self.nn2) ##3027
self.drop_out = nn.Dropout(0.3)
print(“Dropout”)
self.fc2 = nn.Linear( self.nn2, self.nn3) # FULLY CONNECTED LAYERS
self.fc3 = nn.Linear( self.nn3, 1) # FULLY CONNECTED LAYERS

    self.sigmoid = nn.Sigmoid()
    
def forward(self, x):

print(“here: {}”.format(x.shape))

   # print(type(x))

x=np.expand_dims(x,axis=0)

print(“xthen”,x.shape)

x=x.astype(int)

x=torch.from_numpy(x)

    x=x.unsqueeze(1).float()

print(type(x))

    #print(x.shape)
    out = self.layer1(x)
    
   # print(out.shape)
  #  print(out)
    out = self.layer2(out)

print(out.shape)

    out = out.view(out.size(0), -1)

print(out.shape)

    out = self.fc1(out)
    out = self.fc2(out)
    out = self.fc3(out)
    out = self.sigmoid(out)

print(“outsize”,out.shape)

    return out

and I saved it in this way:

Path=root_dir1+’/Fold_’+str(FoldNum)+‘NumDarw=’+str(NumDraw)+‘Iteration’+str(Iteration)+".pth"
checkpoint = {‘model_state_dict’: model.state_dict(),‘optimizer_state_dict’: optimizer.state_dict()}
# print(checkpoint)
torch.save(checkpoint,Path) ## save for each 10 iteration

one time I run my code when I saved the trained Nets on CPU. but now I am loading the trained Nets from GPU to use on CPU and get this error

Thanks for the code. Could you post the code you are using to restore the model?

        model=[]
        optimizer=[]
        TargetWholev2=[]
        for Iteration1 in range(9):
            Path1=root_dir2+'/Fold_'+str(FoldNum)+'NumDarw='+str(NumDraw)+'Iteration'+str(Iteration1+1)+'.pth'

            
            checkpoint = torch.load(Path1,map_location=torch.device('cpu'))

            model.load_state_dict(torch.load(Path1,map_location=torch.device('cpu'))['model_state_dict'])


            optimizer.load_state_dict(torch.load(Path1,map_location=torch.device('cpu'))['optimizer_state_dict'])

You are currently initializing model and optimizer as an empty Python list, which creates this error.
Initialize both as you have done in your training script, i.e.:

model = ConvNet(...)
optimizer = optim.SGD(model.parameters(), ...)

# Now load the state_dicts
model.load_state_dict(...)
optimizer.load_state_dict(...)

I used this but give me this error:(Error(s) in loading state_dict for ConvNet:
size mismatch for layer1.0.weight: copying a param with shape torch.Size([32, 1, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 1, 3, 3, 3]).
size mismatch for layer1.0.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([2]).)

model = ConvNet(2,64,3,3,300,20)
optimizer =torch.optim.Adam(model.parameters(), lr=LR)
TargetWholev2=[]
for Iteration1 in range(9):
Path1=root_dir2+’/Fold_’+str(FoldNum)+‘NumDarw=’+str(NumDraw)+‘Iteration’+str(Iteration1+1)+’.pth’

checkpoint = torch.load(Path1)

            checkpoint = torch.load(Path1,map_location=torch.device('cpu'))

            model.load_state_dict(torch.load(Path1,map_location=torch.device('cpu'))['model_state_dict'])


            optimizer.load_state_dict(torch.load(Path1,map_location=torch.device('cpu'))['optimizer_state_dict'])

The error points to different shapes of your parameters, which means that you’ve initialized the model in a different way.
Could you check, how you’ve initialized the ConvNet before saving the state_dict and use the same arguments?

yes you are right 2 is 32 i missed the 3 :slight_smile:
I really appreciate your time :slight_smile:

excuse me, the time of training, despite the CNN was shallow was 4 days for just one draw. How I can speed up the GPU? The number of workers in data loader are important for speed?

It depends, where the actual bottleneck is.
You could use the ImageNet example to get the actual data loading time. If it stays at a high value, you might have a data loading bottleneck.
If that’s the case, have a look at this post to see some potential workarounds.

On the other hand, if you see the data loading time approaching zero, your model might create the bottleneck, in which case you could try to profile it (e.g. using nsight) and see, which operations are the slowest.

The number of workers should speed up the data loading time. However, there is usually a sweet spot, after which increasing the number of workers might slow down the code again.

Hi ptrblck

I hope you are well. I run my DL , 2 CNN with 32 filters in first layer and 64 filters in second layer followed by 3 FC layers. my samples are balanced with 4000 positives and 4000 negatives. the ROC curve is 0.3 which is very low by 10 fold-cross validation. I check my training set the labeling is true, Do you think the over fitting happens and I need more data for training? I used one droup out in the FC layers.

Cheers
S

How well does the model perform on the training data?
You would observe overfitting, if there is a gap between the training and validation performance.

This type of error happened to me ,
Here’s how i solved:

I had save the model like this

state = {'epoch': epoch + 1, 'state_dict': model.state_dict(), 

                     'optimizer': optimizer.state_dict(), 'loss': loss, }

            torch.save(state, save_path)

So in order to load the model , I had to first run my architecture of model as below

model = Net()
checkpoint = torch.load(path)
model.load_state_dict(checkpoint['state_dict'])

:slight_smile: successfully loaded and tested on test set.

Hello ptrblck,

I am exactly following same process for model loading

model=CQCCModel()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)

model.load_state_dict(torch.load(model_path, map_location=‘cuda’))

optimizer.load_state_dict(torch.load(model_path, map_location=‘cuda’))

but I am getting this errors:

self.class.name, “\n\t”.join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for CQCCModel:
Missing key(s) in state_dict: “layer1.0.weight”, “layer1.0.bias”, “layer1.1.weight”, “layer1.1.bias”, “layer1.1.running_mean”, “layer1.1.running_var”, “layer2.0.conv1.weight”, “layer2.0.conv1.bias”, “layer2.0.bn1.weight”, “layer2.0.bn1.bias”, “layer2.0.bn1.running_mean”, “layer2.0.bn1.running_var”, “layer2.0.conv2.weight”, “layer2.0.conv2.bias”, “layer2.0.conv11.weight”, “layer2.0.conv11.bias”, “layer3.0.conv1.weight”, “layer3.0.conv1.bias”, “layer3.0.bn1.weight”, “layer3.0.bn1.bias”, “layer3.0.bn1.running_mean”, “layer3.0.bn1.running_var”, “layer3.0.conv2.weight”, “layer3.0.conv2.bias”, “layer3.0.conv11.weight”, “layer3.0.conv11.bias”, “layer3.0.pre_bn.weight”, “layer3.0.pre_bn.bias”, “layer3.0.pre_bn.running_mean”, “layer3.0.pre_bn.running_var”, “layer4.0.conv1.weight”,

Unexpected key(s) in state_dict: "module.layer1.0.weight", "module.layer1.0.bias", "module.layer1.1.weight", "module.layer1.1.bias", "module.layer1.1.running_mean", "module.layer1.1.running_var", "module.layer1.1.num_batches_tracked", "module.layer2.0.conv1.weight", "module.layer2.0.conv1.bias", "module.layer2.0.bn1.weight", "module.layer2.0.bn1.bias", "module.layer2.0.bn1.running_mean", "module.layer2.0.bn1.running_var", "module.layer2.0.bn1.num_batches_tracked", "module.layer2.0.conv2.weight", "module.layer2.0.conv2.bias", "module.layer2.0.conv11.weight", "module.layer2.0.conv11.bias", "module.layer3.0.conv1.weight", "module.layer3.0.conv1.bias", "module.layer3.0.bn1.weight", "module.layer3.0.bn1.bias", "module.layer3.0.bn1.running_mean", "module.layer3.0.bn1.running_var", "module.layer3.0.bn1.num_batches_tracked", "module.layer3.0.conv2.weight", "module.layer3.0.conv2.bias", "module.layer3.0.conv11.weight", "module.layer3.0.conv11.bias", "module.layer3.0.pre_bn.weight", "module.layer3.0.pre_bn.bias", "module.layer3.0.pre_bn.running_mean", "module.layer3.0.pre_bn.running_var", "module.layer3.0.pre_bn.num_batches_tracked", "module.layer4.0.conv1.weight", "module.layer4.0.conv1.bias", "module.layer4.0.bn1.weight", "module.layer4.0.bn1.bias", "module.layer4.0.bn1.running_mean", "module.layer4.0.bn1.running_var", "module.layer4.0.bn1.num_batches_tracked", "module.layer4.0.conv2.weight", "module.layer4.0.conv2.bias", "module.layer4.0.conv11.weight", "module.layer4.0.conv11.bias"

I save the model in this way:

model = CQCCModel()

torch.save(model.state_dict(), os.path.join(model_save_path, ‘epoch_{}.pth’.format(epoch)))

I didn’t understand this error.
Could you please let me know why this kind of error is coming and what is the right way to load the model.

Thanks in advance.

The model and optimizer would need their own state_dicts, while you are trying to load the model.state_dict() into both objects.

Store the checkpoint as:

checkpoint = {}
checkpoint['model'] = model.state_dict()
checkpoint['optimizer'] = optimizer.state_dict()
torch.save(checkpoint, PATH)

and load it via:

checkpoint = torch.load(PATH)
model = CQCCModel()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)

model.load_state_dict(checkpoint['model'])
optimizer.load_state_dict(checkpoint['optimizer'])

Thank you so much ptrblck for your reply.

After using this its giving error

checkpoint = torch.load(PATH)
model = CQCCModel()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
model.load_state_dict(checkpoint[‘model’])

error: model.load_state_dict(checkpoint[‘model’])
KeyError: ‘model’

If I am using “model.load_state_dict(checkpoint[model])” than its showing error

error:

KeyError: CQCCModel(
(layer1): Sequential(
(0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): LeakyReLU(negative_slope=0.03)
)
(layer2): Sequential(
(0): ResNetBlock(
(conv1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(lrelu): LeakyReLU(negative_slope=0.01)
(dropout): Dropout(p=0.5)
(conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(3, 3), padding=(1, 1))
(conv11): Conv2d(32, 32, kernel_size=(3, 3), stride=(3, 3), padding=(1, 1))
)
)
(layer3): Sequential(
(0): ResNetBlock(
(conv1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(lrelu): LeakyReLU(negative_slope=0.01)
(dropout): Dropout(p=0.5)
(conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(3, 3), padding=(1, 1))
(conv11): Conv2d(32, 32, kernel_size=(3, 3), stride=(3, 3), padding=(1, 1))
(pre_bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): MaxPool2d(kernel_size=3, stride=3, padding=1, dilation=1, ceil_mode=False)
)
(layer4): Sequential(
(0): ResNetBlock(
(conv1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(lrelu): LeakyReLU(negative_slope=0.01)
(dropout): Dropout(p=0.5)
(conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(3, 3), padding=(1, 1))
(conv11): Conv2d(32, 32, kernel_size=(3, 3), stride=(3, 3), padding=(1, 1))
(pre_bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): MaxPool2d(kernel_size=3, stride=3, padding=1, dilation=1, ceil_mode=False)
)

why this error is coming after defining the model (model = CQCCModel())?

Any suggestions is useful.

Thanks

In your line of code you are passing the model object as the key to the dict:

model.load_state_dict(checkpoint[model])

In my example I’ve used the strings "model" and "optimizer" for the checkpoint.

PS: you can post code snippets by wrapping them into three backticks ```, which makes debugging easier. :wink: