Difference between generate data on GPU and data on CPU then move it to GPU?

Is there any difference between following code:

x = torch.randn(2, 2, device=torch.device("cpu"), requires_grad=Ture).cuda()
x = torch.randn(2, 2, device=torch.device("cuda"), requires_grad=Ture)

when I use the first one, I can not get the grad of x by loss.backward(), and the second one is work.

The .cuda() call returns a non-leaf variable, which won’t be optimized.
@albanD explained it very well in this post.

Thank you for your reply!

So, should I init the parameters of a Module before or after moving it to GPU?

If you are using torch.nn.init methods to initialize your parameters, it shouldn’t matter, since they are manipulating the tensors in-place.

1 Like

Will the parameters becoming non-leaf tensors which wouldn’t got optimized as you said if I transfer Module to GPU?

This won’t be the case, if you use torch.nn.init methods or manipulate them in-place with a custom method. Are you seeing any issues using this approach?

No, I just want to make clear what .cuda() would affect the optimization mechanism.

.cuda() is not an in-place method, but a differentiable operation, such that calling it on an nn.Parameter will create a non-leaf variable as explained in the linked post.

Thanks for the explanation, I’m clear now.