Gradient flow and device

I used “grad1 = grad.clone()” in my hook function. Other tensors are on GPU but it shows “grad1” is on CPU as “grad.clone()” is returning value in CPU. Even when function return the value, it gives error if return tensor is on GPU so need to convert in CPU again.

Is gradient normal flow on CPU? What can i do to convert all grad on GPU so that don’t require to switch from GPU and CPU.

I don’t know how grad and grad1 are calculated, but PyTorch will stick the the specified devices in your script and won’t change the device behind your back.
Of course this doesn’t apply to any 3rd party utilities, which explicitly could push data to the CPU to save GPU memory, but I also don’t know if you are using any of these.

Could you describe your use case a bit and post a minimal code snippet, which would reproduce this issue?

Sorry it was my mistake. Tensor was on CPU.