Saving and loading torch models on 2 machines with different number of GPU devices

SpandanMadan · August 24, 2017, 7:27am

I saved a model using save_state_dict on a machine with 4 GPU’s and I was using the device with id 3. Later, I tried to load the model in a machine with 2 GPU’s, which means id=3 does not work. This throws an error

cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:80

Is this a bug? I believe we should be able to save and load models in different machines. We’re often sharing machines in labs and use the one which is available at that point in time!

Thanks!

theevann · August 24, 2017, 12:59pm

Hi,

Personally, I always save my models on cpu to be able to load them easily anywhere, and put them on gpu later if needed.

However, there is an option in torch.load to “remap storages to be loaded on a different device”.
See this post explaining it: Loading weights for CPU model while trained on GPU
And the doc: http://pytorch.org/docs/master/torch.html?highlight=load#torch.load

Quoting from doc:

torch.load('tensors.pt')
# Load all tensors onto the CPU
torch.load('tensors.pt', map_location=lambda storage, loc: storage)
# Map tensors from GPU 1 to GPU 0
torch.load('tensors.pt', map_location={'cuda:1':'cuda:0'})

SpandanMadan · August 24, 2017, 5:48pm

Thanks a bunch! This would be very helpful