However, I want to use 2 GPUs (e.g. GPU-3 and GPU-4), and specify GPU-4 as the main GPU.
I encountered the following error with `device=‘cuda:1’:
device = cuda:1
gpu_num = 2
reading files...
training_image_num 91 read time 0.0007698535919189453
start training...
0%| | 0/1000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 170, in <module>
x_out = model(x, q)
File "/home/ubuntu/anaconda3/envs/chenbin/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/chenbin/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 156, in forward
"them on device: {}".format(self.src_device_obj, t.device))
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1
It seems that torch.nn.DataParallel requires every input tensor be provided on the first device in its device_ids list .` [Reference]
Is there a way to change the “main” GPU to be GPU-4, instead of GPU-3 (by default), when simultaneously using GPU-3 and GPU-4?