I was trying to run code on 4 GPUs, the GPU id is 4, 5, 6, 7. However I got this problem. When I am running on GPU 0, 1, 2, 3, it is fine. Does anyone have any idea about the reason here?
Traceback (most recent call last):
File "main_boxencoder_new_loss.py", line 279, in <module>
errD_real.backward()
File "/home/didoyang/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py", line 155, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/didoyang/anaconda2/lib/python2.7/site-packages/torch/autograd/__init__.py", line 98, in backward
variables, grad_variables, retain_graph)
File "/home/didoyang/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/_functions.py", line 25, in backward
return comm.reduce_add_coalesced(grad_outputs, self.input_device)
File "/home/didoyang/anaconda2/lib/python2.7/site-packages/torch/cuda/comm.py", line 122, in reduce_add_coalesced
result = reduce_add(flattened, destination)
File "/home/didoyang/anaconda2/lib/python2.7/site-packages/torch/cuda/comm.py", line 92, in reduce_add
nccl.reduce(inputs, outputs, root=destination)
File "/home/didoyang/anaconda2/lib/python2.7/site-packages/torch/cuda/nccl.py", line 161, in reduce
assert(root >= 0 and root < len(inputs))
AssertionError