Can PyTorch move a tensor along with its computational graph from GPU to CPU, and then move it back to GPU for backpropagation?

Can PyTorch move a tensor along with its computational graph from GPU to CPU, and then move it back to GPU for backpropagation? For instance, a is originally on GPU 0, and after computing with b, we get c. Then, I obtain c.sum() and move c to the CPU to free up memory. Next, I move d from the CPU to GPU 0, and continue computing on GPU 0 to get e by combining c.sum() with d. Starting backpropagation from e.sum(), when it propagates back to c, I move d back to the CPU to free up space for moving c back to GPU 0 for continued backpropagation. Can PyTorch do this? It is a workaround for memory constraints.

The to() operation is differentiable, but won’t move intermediates to the target device.
To save memory via CPU offloading you might want to use these hooks.

1 Like

Thank you, it looks very useful, I will apply this to my code.