Torch.load issue

xwgeng · April 1, 2017, 7:49am

Hi, guys

I trained the model on GPU with nn.DataParallel and also load it on GPU to test without nn.DataParallel , using torch.load('model.pt'), but it has some issues as follows

THCudaCheck FAIL file=torch/csrc/cuda/Module.cpp line=84 error=10 : invalid device ordinal
Traceback (most recent call last):
File “test.py”, line 54, in
state_dict = torch.load(path)
File “/users1/xwgeng/anaconda2/lib/python2.7/site-packages/torch/serialization.py”, line 229, in load
return _load(f, map_location, pickle_module)
File “/users1/xwgeng/anaconda2/lib/python2.7/site-packages/torch/serialization.py”, line 377, in _load
result = unpickler.load()
File “/users1/xwgeng/anaconda2/lib/python2.7/site-packages/torch/serialization.py”, line 348, in persistent_load
data_type(size), location)
File “/users1/xwgeng/anaconda2/lib/python2.7/site-packages/torch/serialization.py”, line 85, in default_restore_location
result = fn(storage, location)
File “/users1/xwgeng/anaconda2/lib/python2.7/site-packages/torch/serialization.py”, line 67, in _cuda_deserialize
return obj.cuda(device_id)
File “/users1/xwgeng/anaconda2/lib/python2.7/site-packages/torch/_utils.py”, line 57, in _cuda
with torch.cuda.device(device):
File “/users1/xwgeng/anaconda2/lib/python2.7/site-packages/torch/cuda/init.py”, line 132, in enter
torch._C._cuda_setDevice(self.idx)
RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:84

I have 4 Tesla K40 GPU. In training, I use nn.DataParellel, so I set device_ids=[3,0,1,2](default GPU 03). if I load model with CUDA_VISIBLE_DEVICES=0,1,2,3, it works. Otherwise, some issues occur

apaszke · April 3, 2017, 10:40pm

You might be interested in my answer to this question.

xwgeng · April 4, 2017, 7:47am

Thanks for your reply

It means when I load a model for test which is trained with nn.DataParallel(device_ids=[3,0,1,2])(default GPU 03), the number of available GPUs must is 3 at least. And I can’t make the number of available GPUs less than 3 with CUDA_VISIBLE_DEVICES i.e CUDA_VISIBLE_DEVICES=0 ?

The-Gupta · December 21, 2017, 7:19pm

import torch
import torch.autograd
import torch.nn
import torch.multiprocessing
import torch.utils
import torch.legacy.nn
import torch.legacy.optim

xp = torch.load(r"D:\SDS\1_MachineLearning\amr-eager-master\LDC2015E86\reentrancies.dat")

Traceback (most recent call last):

File “”, line 9, in
xp = torch.load(r"D:\SDS\1_MachineLearning\amr-eager-master\LDC2015E86\reentrancies.dat")

File “D:\Anaconda3\envs\amr-eager\lib\site-packages\torch\serialization.py”, line 261, in load
return _load(f, map_location, pickle_module)

File “D:\Anaconda3\envs\amr-eager\lib\site-packages\torch\serialization.py”, line 399, in _load
magic_number = pickle_module.load(f)

UnpicklingError: invalid load key, ‘’.

I’m loading a model available here.