It seems I can’t get a gradient when sending tensors to cuda:

import torch
print(torch.__version__)
x = torch.tensor(4.2, requires_grad=True).cuda()
y = torch.tensor(5.2, requires_grad=True).cuda()
output = x * y
output.backward()
print(output)
print(x.grad)
print(y.grad)

and the output is:

1.10.1
tensor(21.8400, device='cuda:0', grad_fn=<MulBackward0>)
None
None
C:\Users\**\anaconda3\lib\site-packages\torch\_tensor.py:1013: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten\src\ATen/core/TensorBody.h:417.)
return self._grad

obviously the tensors x and y are leaf tensors, how come their gradients are not calculated? am I missing something? Thanks for any advice!

Thanks for your reply, that’s interesting to know! May I please ask how come cuda() is differentiable?? I thought it is just used to send data from cpu to gpu?

It should be differentiable in the same way that nn.Identity is differentiable; it introduces another tensor in the computation graph even if it does not apply a transformation to the input to produce an output.

Additionally to what @eqy said: it allows you to use different devices without detaching the computation graph. E.g. you could use CPU operations, push the data to the GPU, perform more operations etc. as seen here: