linyu
(linyu)
August 17, 2018, 9:20am
1
Is there any difference between following code:
x = torch.randn(2, 2, device=torch.device("cpu"), requires_grad=Ture).cuda()
x = torch.randn(2, 2, device=torch.device("cuda"), requires_grad=Ture)
when I use the first one, I can not get the grad of x by loss.backward(), and the second one is work.
The .cuda()
call returns a non-leaf variable, which won’t be optimized.
@albanD explained it very well in this post .
linyu
(linyu)
August 17, 2018, 9:25am
3
Thank you for your reply!
dio_din
(dio din)
April 12, 2020, 9:10am
4
So, should I init the parameters of a Module
before or after moving it to GPU?
If you are using torch.nn.init
methods to initialize your parameters, it shouldn’t matter, since they are manipulating the tensors in-place.
1 Like
dio_din
(dio din)
April 12, 2020, 9:27am
6
Will the parameters becoming non-leaf tensors which wouldn’t got optimized as you said if I transfer Module
to GPU?
This won’t be the case, if you use torch.nn.init
methods or manipulate them in-place with a custom method. Are you seeing any issues using this approach?
dio_din
(dio din)
April 12, 2020, 9:45am
8
No, I just want to make clear what .cuda()
would affect the optimization mechanism.
.cuda()
is not an in-place method, but a differentiable operation, such that calling it on an nn.Parameter
will create a non-leaf variable as explained in the linked post.
dio_din
(dio din)
April 12, 2020, 9:57am
10
Thanks for the explanation, I’m clear now.