Error when test on singe GPU using model trained on multiple GPU

Some error came out when I used the model trained on multiple GPUs to test on single GPU.

# model reference
model = torch.load(args.model_save_name)
if args.use_gpu:
    if args.multi_process:
        model = DataParallel(model, device_ids=args.device_ids).cuda()
        model = model.cuda(args.sing_gpu_id)

THCudaCheck FAIL file=torch/csrc/cuda/Module.cpp line=87 error=10 : invalid device ordinal
Traceback (most recent call last):
File “”, line 307, in
model = model_reference(model, dataloaders, args.mb_size, root_fig_dir=args.fig_dir)
File “”, line 71, in model_reference
outputs = model(inputs)
File “/home/mil/huang/.local/lib/python2.7/site-packages/torch/nn/modules/”, line 224, in call
result = self.forward(*input, **kwargs)
File “/home/mil/huang/.local/lib/python2.7/site-packages/torch/nn/parallel/”, line 56, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
File “/home/mil/huang/.local/lib/python2.7/site-packages/torch/nn/parallel/”, line 67, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim)
File “/home/mil/huang/.local/lib/python2.7/site-packages/torch/nn/parallel/”, line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim)
File “/home/mil/huang/.local/lib/python2.7/site-packages/torch/nn/parallel/”, line 25, in scatter
return scatter_map(inputs)
File “/home/mil/huang/.local/lib/python2.7/site-packages/torch/nn/parallel/”, line 18, in scatter_map
return tuple(zip(*map(scatter_map, obj)))
File “/home/mil/huang/.local/lib/python2.7/site-packages/torch/nn/parallel/”, line 15, in scatter_map
return Scatter(target_gpus, dim=dim)(obj)
File “/home/mil/huang/.local/lib/python2.7/site-packages/torch/nn/parallel/”, line 60, in forward
outputs = comm.scatter(input, self.target_gpus, self.chunk_sizes, self.dim, streams)
File “/home/mil/huang/.local/lib/python2.7/site-packages/torch/cuda/”, line 159, in scatter
with torch.cuda.device(device),
File “/home/mil/huang/.local/lib/python2.7/site-packages/torch/cuda/”, line 128, in enter
RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:87

Are there some ways of solving this problem, such as warp the multi-GPU model into single-GPU one?

1 Like

when tested on single gpu, your model trained on multi-gpu must not use DataParallel again.