Autograd function for .to(device) at initialization

Hey all!

This is my first post here, and I tried looking for the answer everywhere but couldn’t find any satisfying answers. However, please let me know if anything I ask has been asked and answered before.

I was working on an implementation of a neural network that uses prototypes. It took me countless of hours of debugging (but in the end I was able to find the culprit) to see why a certain part of the network wasn’t updating at all: it was because of the fact that I ported certain nn.parameters to GPU at initialization time. The thing that gave it away was that the specific part had a grad_fn called “copyBackwards” which seemed rather odd because I figured there was no such action related to the actual gradient I was trying to calculate on this part of the network. Could anyone explain why porting the GPU to device (which was a working GPU) at initialization may cause this behavior where the intended grad function is not calculated but a copyBackwards function is instead?

Thanks so much in advance!

I assume you’ve called the to() operation on your nn.Parameter not the internal tensor?
If so you would create a non-leaf variable, as the result is created by the to() operation, which is differentiable (so that the gradients can flow between different devices).
Try to call the to() operation on the tensor before wrapping them in an nn.Parameter.

Have a look at this post for some more examples.

1 Like