Apologies if this question has been asked already. I searched through the forums and found posts related to my question, but they do not exactly answer my question. It is a very basic question on creating a tensor that requires gradients and then moving it to the GPU. This is my code.
import torch device='cuda' a = torch.randn(3, dtype=torch.float, requires_grad = True).to(device) b = torch.randn(3, dtype=torch.float).to(device) # Post facto set gradients a.requires_grad_() b.requires_grad_() print("a is ",a) print("b is ",b) loss1 = a.sum() loss2 = b.sum() loss1.backward() loss2.backward() print("Gradient wrt to a is ",a.grad) print("Gradient wrt to b is ",b.grad)
The output is
a is tensor([ 1.1719, -0.8410, -1.6699], device='cuda:0', grad_fn=<CopyBackwards>) b is tensor([ 0.8113, 0.6762, -0.0123], device='cuda:0', requires_grad=True) Gradient wrt to a is None Gradient wrt to b is tensor([1., 1., 1.], device='cuda:0')
It looks like if the cpu tensor is created with “requires_grad = True”, the corresponding GPU tensor does not have that property any more but rather has the “grad_fn=” property. If I use this tensor “a” subsequently to compute a loss, then a.grad is None after calling loss.backward(). Whereas b.grad() returns the gradients with respect to b. Is this the desired behavior. If so, could anyone explain why?
I am aware that I can pass the device = ‘cuda’ option directly to the torch.randn function and avoid this problem totally. But for a simple experiment, I wanted to get identical results on the cpu and gpu. The cpu and gpu use different random number generators and that is why I am following this approach. From the discussion here, I understand that the .to() call creates a new tensor. However, it does not explain the behavior that I am seeing above. Any pointers would be highly appreciated.