Hi, guys
I trained the model on GPU with nn.DataParallel
and also load it on GPU to test without nn.DataParallel
, using torch.load('model.pt')
, but it has some issues as follows
THCudaCheck FAIL file=torch/csrc/cuda/Module.cpp line=84 error=10 : invalid device ordinal
Traceback (most recent call last):
File “test.py”, line 54, in
state_dict = torch.load(path)
File “/users1/xwgeng/anaconda2/lib/python2.7/site-packages/torch/serialization.py”, line 229, in load
return _load(f, map_location, pickle_module)
File “/users1/xwgeng/anaconda2/lib/python2.7/site-packages/torch/serialization.py”, line 377, in _load
result = unpickler.load()
File “/users1/xwgeng/anaconda2/lib/python2.7/site-packages/torch/serialization.py”, line 348, in persistent_load
data_type(size), location)
File “/users1/xwgeng/anaconda2/lib/python2.7/site-packages/torch/serialization.py”, line 85, in default_restore_location
result = fn(storage, location)
File “/users1/xwgeng/anaconda2/lib/python2.7/site-packages/torch/serialization.py”, line 67, in _cuda_deserialize
return obj.cuda(device_id)
File “/users1/xwgeng/anaconda2/lib/python2.7/site-packages/torch/_utils.py”, line 57, in _cuda
with torch.cuda.device(device):
File “/users1/xwgeng/anaconda2/lib/python2.7/site-packages/torch/cuda/init.py”, line 132, in enter
torch._C._cuda_setDevice(self.idx)
RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:84
I have 4 Tesla K40 GPU. In training, I use nn.DataParellel
, so I set device_ids=[3,0,1,2](default GPU 03)
. if I load model with CUDA_VISIBLE_DEVICES=0,1,2,3
, it works. Otherwise, some issues occur