Is it possible to set GPUs other than the first one (0) to use more memory?
Try to change the order using CUDA_VISIBLE_DEVICES=1,0 python script.py args
.
This should change both devices, so that data will be accumulated on GPU1.
Thank you very much for the reply. Is this equvilant to the following?
model = nn.DataParallel(model, device_ids=[1, 0])
I think you are right! The first device_id
will be used as the output_device
.
Also, I think you could just set output_device
to the id you want to accumulate your updates on.
Here are the important lines of code.
Great! Thanks a lot!
I did the following for PyTorch 0.4.1:
model = nn.DataParallel(model, device_ids=[3, 0, 1, 2])
And got:
torch/cuda/comm.py", line 40, in broadcast_coalesced
return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
RuntimeError: all tensors must be on devices[0]
Is it working if you pass device_ids=[0, 3, 1, 2]
? or output_device=3
?
I don’t have multiple GPUs currently, otherwise I would test it quickly.
On a 4 x 1080Ti cluster:
I passed device_ids=[3, 0, 1, 2] without setting output_device=3, it didn’t work. I got the error:
torch/cuda/comm.py", line 40, in broadcast_coalesced
return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
RuntimeError: all tensors must be on devices[0]
However, on my own 2 x Titan Xp Linux box:
When I set CUDA_VISIBLE_DEVICES=1,0 with or without passing device_ids=[0, 1], it worked, the second Titan Xp was using more memory.
When I only passed device_ids=[1, 0] without setting CUDA_VISIBLE_DEVICES, I got error:
torch/cuda/comm.py", line 40, in broadcast_coalesced
return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
RuntimeError: all tensors must be on devices[0]
I passed device_ids=[0, 1] and output_device=1, and I got the following error:
torch/nn/functional.py", line 1407, in nll_loss
return torch._C._nn.nll_loss(input, target, weight, Reduction.get_enum(reduction), ignore_index)
RuntimeError: Assertion `THCTensor(checkGPU)(state, 4, input, target, output, total_weight)’ failed. Some of weight/gradient/input tensors are located on different GPUs. Please move them to a single one. at /pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu:29