Loading a model_state_dict after training the model using nn.DataParallel()

benkolber · April 18, 2021, 9:41pm

Figured it out for whoever is interested:

when training a model using DataParallel, in order to load the state_dict onto a model running on the CPU, you must save the model params using torch.save(model.module.state_dict(), PATH).

if you saved your model using torch.save(model.state_dict(), PATH), then when loading the weights into the model, you must first send your model to multiple GPU’s using the DataParallel method, and only then load the state dict.