Loss.backward() throws an error with multi gpus

HI, I try following code with multi gpus, but the command “loss.backward()” throws an error “RuntimeError: arguments are located on different GPUs. The pytorch version is 1.0.0. How can i do it?

from torch import nn
import torch

class MyModule1(nn.Module):
  def __init__(self):
    super(MyModule1, self).__init__()
    self.fc1 = nn.Linear(2, 1)

  def forward(self, x):
    x = self.fc1(x)
    return x

class MyModule2(nn.Module):
  def __init__(self):
    super(MyModule2, self).__init__()
    self.fc2 = nn.Linear(1, 1)

  def forward(self, x):
    x = self.fc2(x)
    return x

model1 = MyModule1()
x = torch.tensor([10, 5], dtype=torch.float).to('cuda:0')
y = model1(x)
z = y.to('cuda:1')
model2 = MyModule2()
w = model2(y)

I have figure it out. Thanks you very much

I encounter this similar problem, how do you solve it ?
Could you tell me.

@DoubtWang I think the problem is that you can not backward through two different devices. Namely input->device1->device2->output and output.backward shall stop at device2.

Thanks for your reply quickly.
I understand this point.
If output.backward() stop at device2,
do I have to throw the multi gpus ?

Autograd is able to create the backward pass through different devices.
The error in the first code was that y was passed to model2 (which was still on cuda:0), while z should be passed.
I’m wondering why the forward pass didn’t throw an error.

Anyway. after fixing this bug, the code should be working.

Thanks for your explanation.

y = y.to('cuda:1')
model2 = MyModule2()
w = model2(y)

well, I think it can work.


what do it means ?
should change w.backward() to w.autograd().backward() ?

No, you just can calculate the loss etc. as usual.
You would just need to make sure the tensors and parameters are on the appropriate device.
In the example code you could just call w.backward() or calculate the loss with a target on GPU1 and call loss.backward().

Thanks for correcting my mistake.

All tensors and parameters need to be on the same device before calling w.backward().
Is this your point?

No, not really. All tensors used in an operation should be on the same device.
Here is a small dummy example:

modelA = modelA.to('cuda:0')
modelB = modelB.to('cuda:1')
modelC = modelC.to('cuda:2')

x = torch.randn(1, 1, device='cuda:0')
target = torch.randn(1, 1, device='cuda:2')
criterion = nn.MSELoss()

output = modelA(x) 
output = output.to('cuda:1')
output = modelB(output)  
output = output.to('cuda:2')
output = modelC(output)
loss = criterion(output, target)  # output and target are both on cuda:2

As you can see in this simple example, you would just have to make sure to push the data onto the device where the next operation should take place.

1 Like

Thanks for your example very much.
I get your point.
need to make sure the tensors and parameters are on the appropriate device when model computing,
need to make sure the finally out and target are on the same device when calling backward,
thanks again.