Can PyTorch move a tensor along with its computational graph from GPU to CPU, and then move it back to GPU for backpropagation?

The to() operation is differentiable, but won’t move intermediates to the target device.
To save memory via CPU offloading you might want to use these hooks.

1 Like