In PyTorch 4.0, when I move a tensor from one GPU:0 to GPU:1, say: mytensor.to(1). During backprop, would the gradient be copied from GPU:1 to GPU:0?
It will
>>> x = torch.randn(1, device='cuda:0', requires_grad=True)
>>> x
tensor([-0.1979], device='cuda:0', requires_grad=True)
>>> x.to('cuda:1').backward()
>>> x
tensor([-0.1979], device='cuda:0', requires_grad=True)
Just adding to @SimonW’s answer to show the gradient:
>>> import torch
>>> x = torch.randn(1, device='cuda:0', requires_grad=True)
>>> x
tensor([0.6692], device='cuda:0', requires_grad=True)
>>> y = x.to('cuda:1') + 25
>>> y
tensor([25.6692], device='cuda:1', grad_fn=<AddBackward0>)
>>> y.backward()
>>> x.grad
tensor([1.], device='cuda:0')
>>>
1 Like
Ah yeah thanks! I forgot to copy the x.grad output.