Hey all!
This is my first post here, and I tried looking for the answer everywhere but couldn’t find any satisfying answers. However, please let me know if anything I ask has been asked and answered before.
I was working on an implementation of a neural network that uses prototypes. It took me countless of hours of debugging (but in the end I was able to find the culprit) to see why a certain part of the network wasn’t updating at all: it was because of the fact that I ported certain nn.parameters to GPU at initialization time. The thing that gave it away was that the specific part had a grad_fn called “copyBackwards” which seemed rather odd because I figured there was no such action related to the actual gradient I was trying to calculate on this part of the network. Could anyone explain why porting the GPU to device (which was a working GPU) at initialization may cause this behavior where the intended grad function is not calculated but a copyBackwards function is instead?
Thanks so much in advance!