[dataparallel] Trained on one GPU but Inference used on multiple GPUs

jS5t3r · March 28, 2021, 4:09pm

I have trained ResNet50 on one GPU… Yes, I was reading other topics using DataParallel, but I have not found any solution for this kind of training…
Later, I use the model in an other way and I want to use multiple GPUs, but I get the following error:

Traceback (most recent call last):
  File "attack.py", line 57, in <module>
    model.load_state_dict(torch.load(settings.CHECKPOINT_PATH))
  File "/home/user/.conda/envs/bm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DataParallel:
        Missing key(s) in state_dict: "module.conv1.0.weight", "module.conv1.1.weight", "module.conv1.1.bias", "module.conv1.1.running_mean", "module.conv1.1.running_var", "module.conv2_x.0.residual_function.0.weight", "module.conv2_x.0.residual_function.1.weight", "module.conv2_x.0.residual_function.1.bias",

sio277 · March 28, 2021, 6:04pm

I guess you had saved your DataParallel-wrapped model, and tried to load it before DP wrapping (keys were mismatched while loading and I guess it’s due to the ‘module’ attribute, where your original model is placed after DP construction). Try to save your model after unwrapping DP, such as,

state = getattr(model, 'module').state_dict()
torch.save(state, file)

jS5t3r · March 28, 2021, 6:48pm

Sorry, for that I confused you. I trained the model on a single GPU without DP wrapping.
Now, I need the DP wrapping for executing the trained model on multiple GPUs.

sio277 · March 28, 2021, 6:52pm

Then, try to load your model before DP construction.