[resolved] Some of weight/gradient/input tensors are located on different GPUs Error

I am training a CNN on top of the features from a pretrained network for doing semantic segmentation. I have these models on two different GPUs and I copy the data to the respective GPUs as shown in the following snippet.

# move pretrained model from CPU to GPU 1
backbone = backbone.cuda(1)

# move the current model to GPU2
model = model.cuda(2)

# NLL criterion
criterion = nn.NLLLoss2d().cuda(2)

   # IN training loop
    x = Variable(t_rgb[i * 4 : i * 4 + cbs].type(dtype).cuda(1), requires_grad=False)
    y = Variable(t_target[i * 4 : i * 4 + cbs].type(th.LongTensor).cuda(2), requires_grad=False)
    
    x = backbone.forward(x).type(dtype).cuda(2)
    output = model(x)
    
    print ('x ', x.get_device())
    print ('output ',output.get_device())
    print ('y ',y.get_device())
    loss = criterion(output, y)
    print ('loss ',loss.get_device())
    optimizer.zero_grad()
    loss.backward()

The output I get is,
x 2
output 2
y 2
loss 2

But then I get

RuntimeError: AssertionTHCTensor_(checkGPU)(state, 3, input, gradInput, gradOutput)’ failed. Some of weight/gradient/input tensors are located on different GPUs. Please move them to a single one. at /b/wheel/pytorch-src/torch/lib/THCUNN/generic/Threshold.cu:49`

The exact stacktrace is as follows.

  File "training.py", line 319, in <module>
    loss.backward()
  File "/opt/python3.5/lib/python3.5/site-packages/torch/autograd/variable.py", line 146, in backward
    self._execution_engine.run_backward((self,), (gradient,), retain_variables)
  File "/opt/python3.5/lib/python3.5/site-packages/torch/nn/_functions/thnn/auto.py", line 175, in backward
    update_grad_input_fn(self._backend.library_state, input, grad_output, grad_input, *gi_args)
RuntimeError: Assertion `THCTensor_(checkGPU)(state, 3, input, gradInput, gradOutput)' failed. Some of weight/gradient/input tensors are located on different GPUs. Please move them to a single one. at /b/wheel/pytorch-src/torch/lib/THCUNN/generic/Threshold.cu:49

From what I understand, input to the model (Variable x) and the model (and so the weights and the gradient) are both on the same GPU (2 in my case). But I don’t understand why I am getting this error. Could someone please help me in figuring out the issue here? Also, I welcome any suggestions related to having models in different GPUs and moving data around.

I managed to figure out the issue. I didn’t set requires_grad flag to False for the parameters in the pretrained network.

Hi, I have the same problem as you. I want to train the same model in two different GPU.

import torch
import torch.nn as nn
import torch.optim as optim

class ToyModel(nn.Module):
  def __init__(self):
    super(ToyModel, self).__init__()
    self.net1 = torch.nn.Linear(10, 10)  
    self.relu = torch.nn.ReLU()
    self.net2 = torch.nn.Linear(10, 5)  

  def forward(self, x):
    x = self.relu(self.net1(x))
    return self.net2(x)
    
model = ToyModel()
loss_fn = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.001)

optimizer.zero_grad()

outputs1 = model.cuda(1)(torch.randn(20, 10).cuda(1)) #gpu=1
outputs2 = model.cuda(2)(torch.randn(20,10).cuda(2)) #gpu=2
labels = torch.randn(20, 5).cuda(3) #gpu=3 loss
loss_fn(outputs1.cuda(3)+outputs2.cuda(3),labels).backward()
optimizer.step()

But the error is RuntimeError: binary_op(): expected both inputs to be on same device, but input a is on cuda:1 and in│
put b is on cuda:2

The code looks alright.
To debug further, I would recommend to split the operations and check the .device attribute of all tensors separately.

Thanks for your reply. In the above code, outputs1, outputs2 and labels are on GPU3. The model are on GPU2. But I don’t know why it can’t work. The error imformation is “but input a is on cuda:1 and input b is on cuda:2”

Lets get the relevant facts straight before we get into the issue:

  1. We are training the model on multiple GPUs.
  2. We are moving the tensors to GPU using .cuda()

The issue:
While performing a computation it is required that the operands (here tensors) be on the same device.

  • While computing the loss i.e criterion(labels, predicted) the labels and predicted values are on different GPUs.

Why has this happened:
The code has made use of .cuda() to migrate tensors from CPU to GPU which does not specify a particular device (eg.0, 1, 2 or 3), hence labels and predicted values may very well be on different GPUs.

Solution:
Just as the the training loop begins, use torch.cuda.set_device(<device>).

  • A witty way of still utilising all GPUs would be : torch.cuda.set_device(iteration % the_number_of_gpus).

This will resolve the error.