I trained a customized CNN model by using model = torch.nn.DataParallel(model, device_ids=[0, 1, 2, 3])
and save the whole model torch.save(model, "./ours_5.pkl")
Now I would like to load this model and test it on single GPU model_net = torch.load(path, map_location="cuda:3").
It gives me a RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:3
If you save your DataParallel model and want to load it in a different env, you need to properly define the mapping of devices using the map_location option in torch.save/load, you can check the following pointers and see if they resolve your issue.
I think the model should be loaded in the first device from the device_ids list.
If I’m right, you must do one of the following options:
Retrain your model with device #3 in the first position: model = torch.nn.DataParallel(model, device_ids=[3, 0, 1, 2])
Load the model in the device #0:model_net = torch.load(path, map_location="cuda:0")
Assuming you don’t want to retrain just to change the device, I think the best option would be to load in the device #0 and then to transfer to device #3, with something like: model_net = torch.load(path, map_location="cuda:0").device("cuda:3")
I’m not sure if it works, it’s just a guess based on what is written in the official docs, but I hope it helps you!