Torch has not attribute load_state_dict?

ptrblck · November 7, 2019, 6:51am

It depends, where the actual bottleneck is.
You could use the ImageNet example to get the actual data loading time. If it stays at a high value, you might have a data loading bottleneck.
If that’s the case, have a look at this post to see some potential workarounds.

On the other hand, if you see the data loading time approaching zero, your model might create the bottleneck, in which case you could try to profile it (e.g. using nsight) and see, which operations are the slowest.

The number of workers should speed up the data loading time. However, there is usually a sweet spot, after which increasing the number of workers might slow down the code again.

saba · November 8, 2019, 5:54am

Hi ptrblck

I hope you are well. I run my DL , 2 CNN with 32 filters in first layer and 64 filters in second layer followed by 3 FC layers. my samples are balanced with 4000 positives and 4000 negatives. the ROC curve is 0.3 which is very low by 10 fold-cross validation. I check my training set the labeling is true, Do you think the over fitting happens and I need more data for training? I used one droup out in the FC layers.

Cheers
S

ptrblck · November 8, 2019, 5:58am

How well does the model perform on the training data?
You would observe overfitting, if there is a gap between the training and validation performance.

mathematics · April 16, 2020, 12:54pm

This type of error happened to me ,
Here’s how i solved:

I had save the model like this

state = {'epoch': epoch + 1, 'state_dict': model.state_dict(), 

                     'optimizer': optimizer.state_dict(), 'loss': loss, }

            torch.save(state, save_path)

So in order to load the model , I had to first run my architecture of model as below

model = Net()
checkpoint = torch.load(path)
model.load_state_dict(checkpoint['state_dict'])

successfully loaded and tested on test set.

Hemlata · July 20, 2020, 2:29pm

Hello ptrblck,

I am exactly following same process for model loading

model=CQCCModel()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)

model.load_state_dict(torch.load(model_path, map_location=‘cuda’))

optimizer.load_state_dict(torch.load(model_path, map_location=‘cuda’))

but I am getting this errors:

self.class.name, “\n\t”.join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for CQCCModel:
Missing key(s) in state_dict: “layer1.0.weight”, “layer1.0.bias”, “layer1.1.weight”, “layer1.1.bias”, “layer1.1.running_mean”, “layer1.1.running_var”, “layer2.0.conv1.weight”, “layer2.0.conv1.bias”, “layer2.0.bn1.weight”, “layer2.0.bn1.bias”, “layer2.0.bn1.running_mean”, “layer2.0.bn1.running_var”, “layer2.0.conv2.weight”, “layer2.0.conv2.bias”, “layer2.0.conv11.weight”, “layer2.0.conv11.bias”, “layer3.0.conv1.weight”, “layer3.0.conv1.bias”, “layer3.0.bn1.weight”, “layer3.0.bn1.bias”, “layer3.0.bn1.running_mean”, “layer3.0.bn1.running_var”, “layer3.0.conv2.weight”, “layer3.0.conv2.bias”, “layer3.0.conv11.weight”, “layer3.0.conv11.bias”, “layer3.0.pre_bn.weight”, “layer3.0.pre_bn.bias”, “layer3.0.pre_bn.running_mean”, “layer3.0.pre_bn.running_var”, “layer4.0.conv1.weight”,

Unexpected key(s) in state_dict: "module.layer1.0.weight", "module.layer1.0.bias", "module.layer1.1.weight", "module.layer1.1.bias", "module.layer1.1.running_mean", "module.layer1.1.running_var", "module.layer1.1.num_batches_tracked", "module.layer2.0.conv1.weight", "module.layer2.0.conv1.bias", "module.layer2.0.bn1.weight", "module.layer2.0.bn1.bias", "module.layer2.0.bn1.running_mean", "module.layer2.0.bn1.running_var", "module.layer2.0.bn1.num_batches_tracked", "module.layer2.0.conv2.weight", "module.layer2.0.conv2.bias", "module.layer2.0.conv11.weight", "module.layer2.0.conv11.bias", "module.layer3.0.conv1.weight", "module.layer3.0.conv1.bias", "module.layer3.0.bn1.weight", "module.layer3.0.bn1.bias", "module.layer3.0.bn1.running_mean", "module.layer3.0.bn1.running_var", "module.layer3.0.bn1.num_batches_tracked", "module.layer3.0.conv2.weight", "module.layer3.0.conv2.bias", "module.layer3.0.conv11.weight", "module.layer3.0.conv11.bias", "module.layer3.0.pre_bn.weight", "module.layer3.0.pre_bn.bias", "module.layer3.0.pre_bn.running_mean", "module.layer3.0.pre_bn.running_var", "module.layer3.0.pre_bn.num_batches_tracked", "module.layer4.0.conv1.weight", "module.layer4.0.conv1.bias", "module.layer4.0.bn1.weight", "module.layer4.0.bn1.bias", "module.layer4.0.bn1.running_mean", "module.layer4.0.bn1.running_var", "module.layer4.0.bn1.num_batches_tracked", "module.layer4.0.conv2.weight", "module.layer4.0.conv2.bias", "module.layer4.0.conv11.weight", "module.layer4.0.conv11.bias"

I save the model in this way:

model = CQCCModel()

torch.save(model.state_dict(), os.path.join(model_save_path, ‘epoch_{}.pth’.format(epoch)))

I didn’t understand this error.
Could you please let me know why this kind of error is coming and what is the right way to load the model.

Thanks in advance.

ptrblck · July 21, 2020, 4:34am

The model and optimizer would need their own state_dicts, while you are trying to load the model.state_dict() into both objects.

Store the checkpoint as:

checkpoint = {}
checkpoint['model'] = model.state_dict()
checkpoint['optimizer'] = optimizer.state_dict()
torch.save(checkpoint, PATH)

and load it via:

checkpoint = torch.load(PATH)
model = CQCCModel()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)

model.load_state_dict(checkpoint['model'])
optimizer.load_state_dict(checkpoint['optimizer'])

Hemlata · July 21, 2020, 5:51am

Thank you so much ptrblck for your reply.

After using this its giving error

checkpoint = torch.load(PATH)
model = CQCCModel()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
model.load_state_dict(checkpoint[‘model’])

error: model.load_state_dict(checkpoint[‘model’])
KeyError: ‘model’

If I am using “model.load_state_dict(checkpoint[model])” than its showing error

error:

KeyError: CQCCModel(
(layer1): Sequential(
(0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): LeakyReLU(negative_slope=0.03)
)
(layer2): Sequential(
(0): ResNetBlock(
(conv1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(lrelu): LeakyReLU(negative_slope=0.01)
(dropout): Dropout(p=0.5)
(conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(3, 3), padding=(1, 1))
(conv11): Conv2d(32, 32, kernel_size=(3, 3), stride=(3, 3), padding=(1, 1))
)
)
(layer3): Sequential(
(0): ResNetBlock(
(conv1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(lrelu): LeakyReLU(negative_slope=0.01)
(dropout): Dropout(p=0.5)
(conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(3, 3), padding=(1, 1))
(conv11): Conv2d(32, 32, kernel_size=(3, 3), stride=(3, 3), padding=(1, 1))
(pre_bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): MaxPool2d(kernel_size=3, stride=3, padding=1, dilation=1, ceil_mode=False)
)
(layer4): Sequential(
(0): ResNetBlock(
(conv1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(lrelu): LeakyReLU(negative_slope=0.01)
(dropout): Dropout(p=0.5)
(conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(3, 3), padding=(1, 1))
(conv11): Conv2d(32, 32, kernel_size=(3, 3), stride=(3, 3), padding=(1, 1))
(pre_bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): MaxPool2d(kernel_size=3, stride=3, padding=1, dilation=1, ceil_mode=False)
)

why this error is coming after defining the model (model = CQCCModel())?

Any suggestions is useful.

Thanks

ptrblck · July 22, 2020, 12:47am

In your line of code you are passing the model object as the key to the dict:

model.load_state_dict(checkpoint[model])

In my example I’ve used the strings "model" and "optimizer" for the checkpoint.

PS: you can post code snippets by wrapping them into three backticks ```, which makes debugging easier.

lililila · May 23, 2024, 12:30pm

Hi ptrblck
I read your comments and I still have the similiar problem as (AttributeError: ‘DeeplabV3’ object has no attribute ‘load_state_dict’) when I trying to miou value with:

import os

from PIL import Image
from tqdm import tqdm
import torch

from deeplab import DeeplabV3

from utils.utils_metrics import compute_mIoU, show_results

miou_mode = 0

num_classes = 13

VOCdevkit_path = 'VOCdevkit'

image_ids = open(os.path.join(VOCdevkit_path, "VOC2007/ImageSets/Segmentation/val.txt"), 'r').read().splitlines()
gt_dir = os.path.join(VOCdevkit_path, "VOC2007/SegmentationClass/")
miou_out_path = "miou_out"
pred_dir = os.path.join(miou_out_path, 'detection-results')

if miou_mode == 0 or miou_mode == 1:
    if not os.path.exists(pred_dir):
        os.makedirs(pred_dir)

    print("Load model.")
    deeplab = DeeplabV3()
    weights_path = 'model_data/best_epoch_weights.pth'  # 确保路径和文件名正确
    if not os.path.exists(weights_path):
        raise FileNotFoundError(f"Weights file not found: {weights_path}")

    # Load model weights
    deeplab =deeplab.load_state_dict()
    state_dict = torch.load(weights_path)
    deeplab.load_state_dict(torch.load(state_dict))
    print("Load model done.")

    print("Get predict result.")
    for image_id in tqdm(image_ids):
        image_path = os.path.join(VOCdevkit_path, "VOC2007/JPEGImages/" + image_id + ".jpg")
        image = Image.open(image_path)
        image = deeplab.get_miou_png(image)
        image.save(os.path.join(pred_dir, image_id + ".png"))
    print("Get predict result done.")

if miou_mode == 0 or miou_mode == 2:
    print("Get miou.")
    hist, IoUs, PA_Recall, Precision = compute_mIoU(gt_dir, pred_dir, image_ids, num_classes, name_classes)  
    print("Get miou done.")

 
    for i in range(num_classes):
        print(f"Class {name_classes[i]}: mIoU = {IoUs[i]}, Accuracy = {PA_Recall[i]}")

    show_results(miou_out_path, hist, IoUs, PA_Recall, Precision, name_classes)

ptrblck · May 23, 2024, 12:31pm

Could you post the definition of DeeplabV3?

lililila · May 23, 2024, 12:52pm

Thank you so much ptrblck for your reply!
Deeplabv3+ model is

import torch
import torch.nn as nn
import torch.nn.functional as F
from nets.xception import xception
from nets.mobilenetv2 import mobilenetv2

class MobileNetV2(nn.Module):
def init(self, downsample_factor=8, pretrained=True):
super(MobileNetV2, self).init()
from functools import partial

    model           = mobilenetv2(pretrained)
    self.features   = model.features[:-1]

    self.total_idx  = len(self.features)
    self.down_idx   = [2, 4, 7, 14]

    if downsample_factor == 8:
        for i in range(self.down_idx[-2], self.down_idx[-1]):
            self.features[i].apply(
                partial(self._nostride_dilate, dilate=2)
            )
        for i in range(self.down_idx[-1], self.total_idx):
            self.features[i].apply(
                partial(self._nostride_dilate, dilate=4)
            )
    elif downsample_factor == 16:
        for i in range(self.down_idx[-1], self.total_idx):
            self.features[i].apply(
                partial(self._nostride_dilate, dilate=2)
            )
    
def _nostride_dilate(self, m, dilate):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        if m.stride == (2, 2):
            m.stride = (1, 1)
            if m.kernel_size == (3, 3):
                m.dilation = (dilate//2, dilate//2)
                m.padding = (dilate//2, dilate//2)
        else:
            if m.kernel_size == (3, 3):
                m.dilation = (dilate, dilate)
                m.padding = (dilate, dilate)

def forward(self, x):
    low_level_features = self.features[:4](x)
    x = self.features[4:](low_level_features)
    return low_level_features, x

class ASPP(nn.Module):
def init(self, dim_in, dim_out, rate=1, bn_mom=0.1):
super(ASPP, self).init()
self.branch1 = nn.Sequential(
nn.Conv2d(dim_in, dim_out, 1, 1, padding=0, dilation=rate,bias=True),
nn.BatchNorm2d(dim_out, momentum=bn_mom),
nn.ReLU(inplace=True),
)
self.branch2 = nn.Sequential(
nn.Conv2d(dim_in, dim_out, 3, 1, padding=6rate, dilation=6rate, bias=True),
nn.BatchNorm2d(dim_out, momentum=bn_mom),
nn.ReLU(inplace=True),
)
self.branch3 = nn.Sequential(
nn.Conv2d(dim_in, dim_out, 3, 1, padding=12rate, dilation=12rate, bias=True),
nn.BatchNorm2d(dim_out, momentum=bn_mom),
nn.ReLU(inplace=True),
)
self.branch4 = nn.Sequential(
nn.Conv2d(dim_in, dim_out, 3, 1, padding=18rate, dilation=18rate, bias=True),
nn.BatchNorm2d(dim_out, momentum=bn_mom),
nn.ReLU(inplace=True),
)
self.branch5_conv = nn.Conv2d(dim_in, dim_out, 1, 1, 0,bias=True)
self.branch5_bn = nn.BatchNorm2d(dim_out, momentum=bn_mom)
self.branch5_relu = nn.ReLU(inplace=True)

	self.conv_cat = nn.Sequential(
			nn.Conv2d(dim_out*5, dim_out, 1, 1, padding=0,bias=True),
			nn.BatchNorm2d(dim_out, momentum=bn_mom),
			nn.ReLU(inplace=True),		
	)

def forward(self, x):
	[b, c, row, col] = x.size()
    
	conv1x1 = self.branch1(x)
	conv3x3_1 = self.branch2(x)
	conv3x3_2 = self.branch3(x)
	conv3x3_3 = self.branch4(x)

	global_feature = torch.mean(x,2,True)
	global_feature = torch.mean(global_feature,3,True)
	global_feature = self.branch5_conv(global_feature)
	global_feature = self.branch5_bn(global_feature)
	global_feature = self.branch5_relu(global_feature)
	global_feature = F.interpolate(global_feature, (row, col), None, 'bilinear', True)

	feature_cat = torch.cat([conv1x1, conv3x3_1, conv3x3_2, conv3x3_3, global_feature], dim=1)
	result = self.conv_cat(feature_cat)
	return result

class DeepLab(nn.Module):
def init(self, num_classes, backbone=“mobilenet”, pretrained=True, downsample_factor=16):
super(DeepLab, self).init()
if backbone==“xception”:

        self.backbone = xception(downsample_factor=downsample_factor, pretrained=pretrained)
        in_channels = 2048
        low_level_channels = 256
    elif backbone=="mobilenet":

        self.backbone = MobileNetV2(downsample_factor=downsample_factor, pretrained=pretrained)
        in_channels = 320
        low_level_channels = 24
    else:
        raise ValueError('Unsupported backbone - `{}`, Use mobilenet, xception.'.format(backbone))

 
    self.aspp = ASPP(dim_in=in_channels, dim_out=256, rate=16//downsample_factor)
    

    self.shortcut_conv = nn.Sequential(
        nn.Conv2d(low_level_channels, 48, 1),
        nn.BatchNorm2d(48),
        nn.ReLU(inplace=True)
    )		

    self.cat_conv = nn.Sequential(
        nn.Conv2d(48+256, 256, 3, stride=1, padding=1),
        nn.BatchNorm2d(256),
        nn.ReLU(inplace=True),
        nn.Dropout(0.5),

        nn.Conv2d(256, 256, 3, stride=1, padding=1),
        nn.BatchNorm2d(256),
        nn.ReLU(inplace=True),

        nn.Dropout(0.1),
    )
    self.cls_conv = nn.Conv2d(256, num_classes, 1, stride=1)

def forward(self, x):
    H, W = x.size(2), x.size(3)

    low_level_features, x = self.backbone(x)
    x = self.aspp(x)
    low_level_features = self.shortcut_conv(low_level_features)

    x = F.interpolate(x, size=(low_level_features.size(2), low_level_features.size(3)), mode='bilinear', align_corners=True)
    x = self.cat_conv(torch.cat((x, low_level_features), dim=1))
    x = self.cls_conv(x)
    x = F.interpolate(x, size=(H, W), mode='bilinear', align_corners=True)
    return x

ptrblck · May 23, 2024, 3:38pm

Your code is unfortunately not properly formatted and also not executable as it uses 3rd party dependencies.
However, by using the torchvision.models.mobilenet_v2 in your base model and fixing the manual manipulation of some layers (it’ll otherwise fail with: AttributeError: 'Conv2dNormActivation' object has no attribute 'stride'), I’m unable to reproduce the issue:

class MobileNetV2(nn.Module):
    def __init__(self, downsample_factor=8, pretrained=True):
        super(MobileNetV2, self).__init__()
        from functools import partial
    
        model           = models.mobilenet_v2(pretrained)
        self.features   = model.features[:-1]
    
        self.total_idx  = len(self.features)
        self.down_idx   = [2, 4, 7, 14]
    
        if downsample_factor == 8:
            for i in range(self.down_idx[-2], self.down_idx[-1]):
                self.features[i].apply(
                    partial(self._nostride_dilate, dilate=2)
                )
            for i in range(self.down_idx[-1], self.total_idx):
                self.features[i].apply(
                    partial(self._nostride_dilate, dilate=4)
                )
        elif downsample_factor == 16:
            for i in range(self.down_idx[-1], self.total_idx):
                self.features[i].apply(
                    partial(self._nostride_dilate, dilate=2)
                    )
        
    def _nostride_dilate(self, m, dilate):
        classname = m.__class__.__name__
        if classname.find('Conv') != -1:
            if hasattr(m, "stride"):
                if m.stride == (2, 2):
                    m.stride = (1, 1)
                    if m.kernel_size == (3, 3):
                        m.dilation = (dilate//2, dilate//2)
                        m.padding = (dilate//2, dilate//2)
                else:
                    if m.kernel_size == (3, 3):
                        m.dilation = (dilate, dilate)
                        m.padding = (dilate, dilate)
        
    def forward(self, x):
        low_level_features = self.features[:4](x)
        x = self.features[4:](low_level_features)
        return low_level_features, x 
    
model = MobileNetV2()
sd = model.state_dict()
model.load_state_dict(sd)
# <All keys matched successfully>

and can properly load a state_dict.

lililila · May 24, 2024, 7:55am

Thank you so much for your reply!I’ve checked it many times and found that I used the wrong weights file .That’s why this parameter mismatch occured. And I already sloved my problem by using correct weights file.
Thanks again for your enthusiastic reply!

ptrblck · May 24, 2024, 1:45pm

This doesn’t match your previous claim:

I read your comments and I still have the similiar problem as (AttributeError: ‘DeeplabV3’ object has no attribute ‘load_state_dict’) when I trying to miou value with:

pointing to the missing load_state_dict call, so I’m unsure what else is broken in your code, but it’s good to hear some issues are fixed now.