Error loading a model on CPU when trained with nn.DataParallel on multi GPUs

pejibaye · October 29, 2018, 8:27am

I use the nn.DataParallel to wrap my model to be trained on multiple GPUs and optimization was run successfully. Once the model was trained I save the model with the following code:
torch.save(cust_model.state_dict(), path)
Now when I want to make predictions, I use the following:
cust_model.load_state_dict(torch.load(model_dir, map_location=map_location_device))
, where the map_location_device is equal to “cpu” and model_dir is the model directory.
I get the following error when trying to load this model.

RuntimeError: Error(s) in loading state_dict for DenseSegmModel:
Missing key(s) in state_dict: “layer1.0.weight”, “layer1.1.weight”, “layer1.1.bias”, “layer2.denseblock1.denselayer1.norm1.weight”, “layer2.denseblock1.denselayer1.norm1.bias”, and it goes on

Any ideas?

Miles_Cranmer · October 29, 2018, 11:57am

Are you using nn.DataParallel both for the CPU and GPU models? I had a similar issue and that fixed it for me.

Edit: Doesn’t look like you are. Try that adding nn.DataParallel(.) to wrap your model before loading the state dict.