Prefix parameter names in saved model if trained by multi-GPU?


If I train a model with one GPU (without nn.DataParallel), the parameter names in saved models are something like:

If the model was initialized with nn.DataParallel, the saved parameter names have a prefix:

While doing inference I only use one GPU so the model failed to load the latter model file because the parameter names are not matching. I am wondering why parameter names are prepend the prefix? Can I trim the prefix and still use the model?



Yes, you can just remove the prefix:

state_dict = {k.partition('model.')[2]: v for k,v in state_dict}
1 Like

Thanks for the reply! Removing prefix works.

Curious why the prefix is needed? It creates inconvenience when we want to resume training a single-GPU-trained-model with multi-GPU, or pass a multi-GPU trained model to inference code that only uses one GPU?

Also it looks the pretrained resnet doesn’t have ‘module.’ prefix, does it mean that they were trained on single GPU?

1 Like

It’s needed because that’s how state_dicts work :wink: You recursively go over the network, prepending the names. But maybe it’s a good idea to override that for DataParallel.

No, they probably had the prefixes trimmed before serialization.


Hi I met the same problem but
after I changed the code like this:

state_dict =checkpoint['model_dict']
state_dict = {k.partition('model.')[2]: v for k,v in state_dict}

it showed this error:
ValueError: too many values to unpack (expected 2)
why it didn’t work?

1 Like

{k.partition(‘module.’)[2]:state_dict[k] for k in state_dict.keys()}


I know this thread is a little bit old, but I just run into the same problem. From

"torch.nn.DataParallel is a model wrapper that enables parallel GPU utilization. To save a DataParallel model generically, save the model.module.state_dict() . This way, you have the flexibility to load the model any way you want to any device you want."