Can PyTorch move a tensor along with its computational graph from GPU to CPU, and then move it back to GPU for backpropagation?

ptrblck · April 3, 2024, 7:33pm

The to() operation is differentiable, but won’t move intermediates to the target device.
To save memory via CPU offloading you might want to use these hooks.