Loading a model_state_dict after training the model using nn.DataParallel()

benkolber · April 18, 2021, 6:59pm

Hey!

I trained a model on 2 GPU’s using the DataParallel function, and saved the model state dict using:

    torch.save(model.state_dict(), PATH)

For some reason when loading the dict into a model using:

model = models.segmentation.deeplabv3_resnet101(pretrained = True)
model.load_state_dict(torch.load(PATH))

I get a key mismatch. I realized this was due to how I saved my model after using DataParallel, which in retrospect should have been:

torch.save(model.module.state_dict(), PATH)

Does anyone know a way to salvage this situation? i.e how I can load my state dict into a model, even though I saved the model differently from the documentation?

For sanity’s purposes, I also saved a model using model.module.state_dict() while using DataParallel, and again using model.state_dict() using only a single GPU, in which both cases loaded the state_dict with no issues at all. Help?

benkolber · April 18, 2021, 9:41pm

Figured it out for whoever is interested:

when training a model using DataParallel, in order to load the state_dict onto a model running on the CPU, you must save the model params using torch.save(model.module.state_dict(), PATH).

if you saved your model using torch.save(model.state_dict(), PATH), then when loading the weights into the model, you must first send your model to multiple GPU’s using the DataParallel method, and only then load the state dict.

Victor_M · November 1, 2023, 2:07pm

Hi all, I found this site after having the same problem. In the past, I only used model.cuda() for training and testing. Then, I changed to model = nn.DataParallel(model) for training and testing and selected the GPU’s IDs by command line CUDA_VISIBLE_DEVICES. The problem appeared when I needed to use multiple GPUs for a system while keeping my model under 1 GPU only. Switching the testing to model.cuda() didn’t work (only zeros). I made it work by applying a similar technique to testing in CPU only. I will copy and paste what I have:

Training:
Let’s consider that at each epoch you save:

torch.save(model.state_dict(), name2save)

Then:

train_on_gpu = torch.cuda.is_available()
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
if train_on_gpu:
    model= nn.DataParallel(model)
model.to(device)

Testing using nn.DataParallel(model) (the ‘else’ is for CPU):

if train_on_gpu:
    model= nn.DataParallel(model)
model.to(device)
if train_on_gpu:
    model.load_state_dict(torch.load(name2load), strict=False)
else:
    from collections import OrderedDict
    state_dict = torch.load(name2load, map_location='cpu')
    new_state_dict = OrderedDict()
    for k, v in state_dict.items():
        new_state_dict[k.replace("module.", "")] = v
    model.load_state_dict(new_state_dict)

Testing without nn.DataParallel(model):

if train_on_gpu:
    model.cuda()
model.to(device)
from collections import OrderedDict
state_dict = torch.load(name2load, map_location='cpu')
new_state_dict = OrderedDict()
for k, v in state_dict.items():
    new_state_dict[k.replace("module.", "")] = v
model.load_state_dict(new_state_dict)

I think the last solution can be used continuously, but I kept the if-else in the second case since I was looking for fewer instructions for the final model (in that case).

I hope this helps.