Every time I used DataParallel to run my code on multiple GPUs, I feel it so difficult to make it work. Recently I came across this error:
all tensor must be on devices[0]
I tried hours to make it work but I couldn’t, please help me out.
What I did:
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"os.environ["CUDA_VISIBLE_DEVICES"] = "3, 6"- I moved all
input_tensorandmodeltocuda:0withinput_tensor.to("cuda:0")andmodel.to("cuda:0") model = torch.nn.DataParallel(model)loss.mean().backward()
I tried:
os.environ["CUDA_VISIBLE_DEVICES"] = "0, 6"-
input_tensor.to("cuda:6")andmodel.to("cuda:6")…nothing worked.
What am I doing wrong?
As my server cuda:0 is always busy, I like to run run my code on other gpus like cuda:3 and cuda:6 efficiently. Any help highly appreciated.