Tensor.to() -> gradient

In PyTorch 4.0, when I move a tensor from one GPU:0 to GPU:1, say: mytensor.to(1). During backprop, would the gradient be copied from GPU:1 to GPU:0?

It will

>>> x = torch.randn(1, device='cuda:0', requires_grad=True)
>>> x
tensor([-0.1979], device='cuda:0', requires_grad=True)
>>> x.to('cuda:1').backward()
>>> x
tensor([-0.1979], device='cuda:0', requires_grad=True)

Just adding to @SimonW’s answer to show the gradient:

>>> import torch

>>> x = torch.randn(1, device='cuda:0', requires_grad=True)

>>> x
tensor([0.6692], device='cuda:0', requires_grad=True)

>>> y = x.to('cuda:1') + 25

>>> y
tensor([25.6692], device='cuda:1', grad_fn=<AddBackward0>)

>>> y.backward()

>>> x.grad
tensor([1.], device='cuda:0')

>>> 

1 Like

Ah yeah thanks! I forgot to copy the x.grad output.