On a cpu device, how to load checkpoint saved on gpu device

I trained my network on a gpu device and saved checkpoint by torch.save

Loading this checkpoint on my cpu device gives an error:

    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled```

You can remap the Tensor location at load time using the map_location argument to torch.load. For example this will forcefully remap everything onto CPU:

torch.load('my_file.pt', map_location=lambda storage, location: 'cpu')

While this will only map storages from GPU0:

torch.load('my_file.pt', map_location={'cuda:0': 'cpu'})

I’m trying to load a GPU-trained model onto a CPU with the code you suggested:

torch.load('my_file.pt', map_location=lambda storage, location: 'cpu')

… and I get this error:

Traceback (most recent call last):
  File "net_predict.py", line 146, in <module>
    net = torch.load(f_net, map_location=(lambda storage, location: 'cpu'))
  File "/home/[...]/anaconda2/lib/python2.7/site-packages/torch/serialization.py", line 248, in load
    return _load(f, map_location, pickle_module)
  File "/home/[...]/anaconda2/lib/python2.7/site-packages/torch/serialization.py", line 340, in _load
    tensor = tensor_type._new_with_metadata_file(f, storage)
AttributeError: type object 'str' has no attribute '_new_with_metadata_file'

(I replaced my username with […])

Any idea what I’m doing wrong?

I’m sorry, my bad. This should work:

torch.load('my_file.pt', map_location=lambda storage, loc: storage)

It works - brilliant! :slight_smile:

Out of curiosity: could you explain what this does? I’m not sure how it knows to remap storage to CPU, since the lambda returns the storage it got as an argument.

Sure. map_location can be either a dict where the locations corresponding to keys are remaped to their values. Alternatively, we support passing in a function, that will get a CPU storage and its serialized location, and it should return some storage that will replace the CPU one. If you just want to load everything onto the CPU, you can just return the first arugment, but you could do some more crazy stuff like sending all CUDA tensors to the next GPU, by parsing out the original device from the loc argument.


@apaszke Hi! I am sorry to reopen this thread. I have encountered a problem when I used the above method to load a GPU-trained model on CPU mode. The code fragment is:

import torch

encoder = torch.load('encoder.pt', map_location=lambda storage, loc: storage)
decoder = torch.load('decoder.pt', map_location=lambda storage, loc: storage)


And the error I met was:

The full code can be viewed at seq2seq-translation/eval.py

How can I load a GPU-trained model on a CPU device (without any GPUs) correctly? Thank you for your great work!

Hey, no problem! I only have a couple more questions:

  1. What’s your PyTorch version? Do you have torch.__version__? If no, when did you install it?
  2. When did you create that checkpoint?

Good morning!

  1. The version of my PyTorch is 0.1.9+b46d5e0. I have compiled the PyTorch from source since I want to try to use half tensor with stateless methods. (You have mentioned it in this pull request. Excellent!)

  2. I created the checkpoint about 12 hours before, which also used the 0.1.9+b46d5e0 version of PyTorch.

Thank you very much!

I have uploaded some test data to my github repo. If you have time maybe you can try it:

  1. train a model:
    python train_attn.py
  2. load the model and do some inferences:
    python eval.py

May this is useful to provide some information for solving the problem. Thank you!

@apaszke I suggest that it may be the version problem. I can’t reproduce this error when I use the 0.1.9_2 version of PyTorch. Thanks!

1 Like

Sorry to reopen the thread.

After running the code:

params = torch.load(input_file, lambda storage, loc: storage)

I met the same problem as Yangyu met before. The error message shows:

TypeError: set_ received an invalid combination of arguments - got (torch.FloatStorage, int, tuple, tuple), but expected one of:

  • no arguments
  • (torch.cuda.FloatTensor source)
  • (torch.cuda.FloatStorage storage)
  • (torch.cuda.FloatStorage sourceStorage, int storage_offset, int … size)
    didn’t match because some of the arguments have invalid types: (!torch.FloatStorage!, int, !tuple!, !tuple!)
  • (torch.cuda.FloatStorage sourceStorage, int storage_offset, torch.Size size)
  • (torch.cuda.FloatStorage sourceStorage, int storage_offset, torch.Size size, tuple strides)

I just updated my pytorch to the latest version in the master branch. The version number is 0.1.11+761eef1. Any idea why?


Hello, I tried to load a snapshot from gpu-training to run it on CPU-mode, but faced with the same problem, that described above. Of course, tried to use given advice, but there is no effect.

torch.load('./snapshots/cpu_final_snapshot.pth', map_location=lambda storage, loc: storage)

I have the following traceback:

Traceback (most recent call last):
  File "predict.py", line 39, in <module>
    params = torch.load('./snapshots/cpu_final_snapshot.pth', map_location=lambda storage, loc: storage)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/serialization.py", line 222, in load
    return _load(f, map_location, pickle_module)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/serialization.py", line 370, in _load
    result = unpickler.load()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/cuda/__init__.py", line 279, in __new__
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/cuda/__init__.py", line 96, in _lazy_init
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/cuda/__init__.py", line 63, in _check_driver
    raise AssertionError("Torch not compiled with CUDA enabled")

Would be appreciated any help.

It seems that I found the problem that causes the error of “invalid combination of arguments”.

Yesterday I used the model trained on 0.1.9 version of pytorch, and loaded it to cpu using the latest version of 0.1.11. The error appeared.

Today I retrained the model using the latest version of 0.1.11 and loaded also using the latest version. Everything works.

So I guess that there are inconsistencies between different versions of pytorch models.


We have trained an Alexnet with pytorch examples imagenet (https://github.com/pytorch/examples/blob/master/imagenet/main.py) and have been struggling to convert the model for use on CPU and for inference only. Here is a solution for AlexNet:
It would be nice to have something more generic…


When I use the torch 1.0.0, the given code will produce the result as the following:
torch.load(‘save/best_BiLSTMCRF_pos_2019-01-10 12-42-50’, map_location=lambda storage, location: ‘cpu’)
Traceback (most recent call last):
File “”, line 1, in
File “/home/jiaxin/.local/lib/python3.6/site-packages/torch/serialization.py”, line 367, in load
return _load(f, map_location, pickle_module)
File “/home/jiaxin/.local/lib/python3.6/site-packages/torch/serialization.py”, line 538, in _load
result = unpickler.load()
File “/home/jiaxin/.local/lib/python3.6/site-packages/torch/_utils.py”, line 135, in _rebuild_tensor_v2
tensor = _rebuild_tensor(storage, storage_offset, size, stride)
File “/home/jiaxin/.local/lib/python3.6/site-packages/torch/_utils.py”, line 129, in _rebuild_tensor
module = importlib.import_module(storage.module)
AttributeError: ‘str’ object has no attribute ‘module

Is anything wrong with the new version of PyTorch?


Had the same thing. See the comments about using map_location=lambda storage, location: storage instead of 'cpu'

If you want to force the map_location to cpu, you can eliminate the lambda and simply use:

torch.load(‘save/best_BiLSTMCRF_pos_2019-01-10 12-42-50’,map_location=‘cpu’)

This is discussed in the report for issue #9139.

Sorry for reviving this post. I have a closely related question. I want to do the exact same thing, but using the C++ front-end. I.e. I want to save a model, trained using the C++ front-end on GPU, and then load in using the C++ front-end on a CPU device.

It is possible? The documentation on torch::load does not give the map_location? Thanks for any help.

torch.load(WEIGHTS_FILE, map_location=torch.device(‘cpu’) )