Prefix parameter names in saved model if trained by multi-GPU?

ming · February 14, 2017, 7:23pm

Hi,

If I train a model with one GPU (without nn.DataParallel), the parameter names in saved models are something like:
features.0.weight

If the model was initialized with nn.DataParallel, the saved parameter names have a prefix:
model.features.0.weight

While doing inference I only use one GPU so the model failed to load the latter model file because the parameter names are not matching. I am wondering why parameter names are prepend the prefix? Can I trim the prefix and still use the model?

Thanks.
Ming

apaszke · February 14, 2017, 7:26pm

Yes, you can just remove the prefix:

state_dict = {k.partition('model.')[2]: v for k,v in state_dict}

ming · February 14, 2017, 11:16pm

Thanks for the reply! Removing prefix works.

Curious why the prefix is needed? It creates inconvenience when we want to resume training a single-GPU-trained-model with multi-GPU, or pass a multi-GPU trained model to inference code that only uses one GPU?

Also it looks the pretrained resnet doesn’t have ‘module.’ prefix, does it mean that they were trained on single GPU?

apaszke · February 15, 2017, 3:40pm

It’s needed because that’s how state_dicts work You recursively go over the network, prepending the names. But maybe it’s a good idea to override that for DataParallel.

No, they probably had the prefixes trimmed before serialization.

lunasdejavu · May 24, 2019, 8:00am

Hi I met the same problem but
after I changed the code like this:

state_dict =checkpoint['model_dict']
state_dict = {k.partition('model.')[2]: v for k,v in state_dict}
model.load_state_dict(state_dict)

it showed this error:
ValueError: too many values to unpack (expected 2)
why it didn’t work?

Lynx_Commando · November 11, 2019, 11:47am

{k.partition(‘module.’)[2]:state_dict[k] for k in state_dict.keys()}

pepeportolo · May 3, 2022, 11:43am

I know this thread is a little bit old, but I just run into the same problem. From https://pytorch.org/tutorials/beginner/saving_loading_models.html:

"torch.nn.DataParallel is a model wrapper that enables parallel GPU utilization. To save a DataParallel model generically, save the model.module.state_dict() . This way, you have the flexibility to load the model any way you want to any device you want."

Regards
Pepe