Difference between generate data on GPU and data on CPU then move it to GPU?

linyu · August 17, 2018, 9:20am

Is there any difference between following code:

x = torch.randn(2, 2, device=torch.device("cpu"), requires_grad=Ture).cuda()

x = torch.randn(2, 2, device=torch.device("cuda"), requires_grad=Ture)

when I use the first one, I can not get the grad of x by loss.backward(), and the second one is work.

ptrblck · August 17, 2018, 9:24am

The .cuda() call returns a non-leaf variable, which won’t be optimized.
@albanD explained it very well in this post.

linyu · August 17, 2018, 9:25am

Thank you for your reply!

dio_din · April 12, 2020, 9:10am

So, should I init the parameters of a Module before or after moving it to GPU?

ptrblck · April 12, 2020, 9:13am

If you are using torch.nn.init methods to initialize your parameters, it shouldn’t matter, since they are manipulating the tensors in-place.

dio_din · April 12, 2020, 9:27am

Will the parameters becoming non-leaf tensors which wouldn’t got optimized as you said if I transfer Module to GPU?

ptrblck · April 12, 2020, 9:38am

This won’t be the case, if you use torch.nn.init methods or manipulate them in-place with a custom method. Are you seeing any issues using this approach?

dio_din · April 12, 2020, 9:45am

No, I just want to make clear what .cuda() would affect the optimization mechanism.

ptrblck · April 12, 2020, 9:55am

.cuda() is not an in-place method, but a differentiable operation, such that calling it on an nn.Parameter will create a non-leaf variable as explained in the linked post.

dio_din · April 12, 2020, 9:57am

Thanks for the explanation, I’m clear now.