Is there any difference between following code:
x = torch.randn(2, 2, device=torch.device("cpu"), requires_grad=Ture).cuda()
x = torch.randn(2, 2, device=torch.device("cuda"), requires_grad=Ture)
when I use the first one, I can not get the grad of x by loss.backward(), and the second one is work.
.cuda() call returns a non-leaf variable, which won’t be optimized.
@albanD explained it very well in this post.
Thank you for your reply!
So, should I init the parameters of a
Module before or after moving it to GPU?
If you are using
torch.nn.init methods to initialize your parameters, it shouldn’t matter, since they are manipulating the tensors in-place.
Will the parameters becoming non-leaf tensors which wouldn’t got optimized as you said if I transfer
Module to GPU?
This won’t be the case, if you use
torch.nn.init methods or manipulate them in-place with a custom method. Are you seeing any issues using this approach?
No, I just want to make clear what
.cuda() would affect the optimization mechanism.
.cuda() is not an in-place method, but a differentiable operation, such that calling it on an
nn.Parameter will create a non-leaf variable as explained in the linked post.
Thanks for the explanation, I’m clear now.