Training on Multiple GPUs

Every time I used DataParallel to run my code on multiple GPUs, I feel it so difficult to make it work. Recently I came across this error:

all tensor must be on devices[0]

I tried hours to make it work but I couldn’t, please help me out.

What I did:

  • os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
  • os.environ["CUDA_VISIBLE_DEVICES"] = "3, 6"
  • I moved all input_tensor and model to cuda:0 with input_tensor.to("cuda:0") and model.to("cuda:0")
  • model = torch.nn.DataParallel(model)
  • loss.mean().backward()

I tried:

  • os.environ["CUDA_VISIBLE_DEVICES"] = "0, 6"
  • input_tensor.to("cuda:6") and model.to("cuda:6")…nothing worked.

What am I doing wrong?

As my server cuda:0 is always busy, I like to run run my code on other gpus like cuda:3 and cuda:6 efficiently. Any help highly appreciated.