Move activations to device

I need to somewhat move the activations to cpu after the computation graph is generated. For instance, some computation graphs are generated in the GPU as follows:

input1, input2 = torch.randn(128,3,32,32)
output1 = model(input)
output2 = model(input)

which would need the memory of about 2M with 2 computational graphs generated. Now I need to place the activations associated with output1 to cpu, such that the memory in GPU would fall down to about M. This should make sense, but I don’t know how to implement it. Thanks!

You can move output1 to cpu by doing output1.cpu(). Or are you asking how to move activations that were saved by the PyTorch autograd engine to CPU to avoid OOM-ing?

Yes. I am trying to move all the activations linked with output1, not just the tensor itself.

I don’t think there’s an easy way to do that in PyTorch today. One way you could save memory is to to use torch.utils.checkpoint, which re-computes the intermediate activations.

I think it would make sense for there to be a checkpointing option in PyTorch that saves activations on CPU to avoid using too much GPU memory. Please feel free to open us a feature request at Issues · pytorch/pytorch · GitHub.

If you do want to go down the route of manually implementing this by yourself, one (very tedious) way is to