CopyBackward between devices

almeetb · July 22, 2020, 4:23pm

If there is some tensor on GPU that requires_grad but is later copied to CPU the grad_fn for the CopyBackwards would set the src_device to be GPU so during the backwards pass would this result in the tensor being copied back to the GPU?

If so, is there a way to disable the second copy back to GPU?

Thanks!

albanD · July 22, 2020, 8:36pm

Yes it will.
The reason is that a Tensor and its gradient are always on the same device.

almeetb · July 23, 2020, 6:02pm

class Dummy(nn.Module):
    def __init__(self, input_shape):
        super(Dummy, self).__init__()
        self.parameter = nn.Parameter(torch.randn(*input_shape), requires_grad=True)

    def forward(self, x):
        return x + self.parameter
        

model = Dummy((1, 2, 3)).to('cuda:0')
input = torch.randn((1, 2, 3))
input = input.to('cuda:0')
output = model(input)
output = output.to('cpu')

model.to('cpu')
loss_fn = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)

loss = loss_fn(output, golden_output)

optimizer.zero_grad()
loss.backward()
optimizer.step()

I know the above example is pretty useless but since the module has no intermediate outputs, when the output + parameters are moved to cpu wouldn’t the backwards pass be able to run on CPU.

In this way, if this was a larger module with more parameters and gradient outputs that are moved to a different device, wouldn’t the backwards pass be able to run there?

From my understanding of Loss.backward() throws an error with multi gpus as long as the tensors are moved to the appropriate device the operations should be able to run.

albanD · July 23, 2020, 6:39pm

In general, it is very complex to say if it can or cannot work.
The basic assumption we make is that since the backward pass is very similar to the forward pass, running the backward where the forward happened is a good idea.
And this is why all the backward pass of the op that goes from gpu -> cpu is actually a function that goes from cpu -> gpu.

almeetb · July 23, 2020, 7:01pm

I see, thanks for the clarification!