If there is some tensor on GPU that requires_grad but is later copied to CPU the grad_fn for the CopyBackwards would set the src_device to be GPU so during the backwards pass would this result in the tensor being copied back to the GPU?
If so, is there a way to disable the second copy back to GPU?
I know the above example is pretty useless but since the module has no intermediate outputs, when the output + parameters are moved to cpu wouldn’t the backwards pass be able to run on CPU.
In this way, if this was a larger module with more parameters and gradient outputs that are moved to a different device, wouldn’t the backwards pass be able to run there?
In general, it is very complex to say if it can or cannot work.
The basic assumption we make is that since the backward pass is very similar to the forward pass, running the backward where the forward happened is a good idea.
And this is why all the backward pass of the op that goes from gpu -> cpu is actually a function that goes from cpu -> gpu.