Move activations to device

Bruce_zhuang · January 7, 2021, 7:08am

I need to somewhat move the activations to cpu after the computation graph is generated. For instance, some computation graphs are generated in the GPU as follows:

input1, input2 = torch.randn(128,3,32,32)
output1 = model(input)
output2 = model(input)

which would need the memory of about 2M with 2 computational graphs generated. Now I need to place the activations associated with output1 to cpu, such that the memory in GPU would fall down to about M. This should make sense, but I don’t know how to implement it. Thanks!

richard · January 7, 2021, 4:22pm

You can move output1 to cpu by doing output1.cpu(). Or are you asking how to move activations that were saved by the PyTorch autograd engine to CPU to avoid OOM-ing?

Bruce_zhuang · January 8, 2021, 7:01am

Yes. I am trying to move all the activations linked with output1, not just the tensor itself.

richard · January 8, 2021, 4:13pm

I don’t think there’s an easy way to do that in PyTorch today. One way you could save memory is to to use torch.utils.checkpoint, which re-computes the intermediate activations.

I think it would make sense for there to be a checkpointing option in PyTorch that saves activations on CPU to avoid using too much GPU memory. Please feel free to open us a feature request at Issues · pytorch/pytorch · GitHub.

If you do want to go down the route of manually implementing this by yourself, one (very tedious) way is to

For each operator you’re using, write a custom autograd.Function
the custom autograd.Function has a forward pass. You’d have to manually write a backward pass for it, but that can be doable by copying and pasting the relevant code from pytorch/derivatives.yaml at master · pytorch/pytorch · GitHub
When saving tensors for the backward pass, convert them to CPU before saving.