I need to somewhat move the activations to cpu after the computation graph is generated. For instance, some computation graphs are generated in the GPU as follows:
input1, input2 = torch.randn(128,3,32,32)
output1 = model(input)
output2 = model(input)
which would need the memory of about 2M
with 2 computational graphs generated. Now I need to place the activations associated with output1
to cpu
, such that the memory in GPU would fall down to about M
. This should make sense, but I don’t know how to implement it. Thanks!
You can move output1
to cpu by doing output1.cpu()
. Or are you asking how to move activations that were saved by the PyTorch autograd engine to CPU to avoid OOM-ing?
Yes. I am trying to move all the activations linked with output1
, not just the tensor itself.
I don’t think there’s an easy way to do that in PyTorch today. One way you could save memory is to to use torch.utils.checkpoint
, which re-computes the intermediate activations.
I think it would make sense for there to be a checkpointing option in PyTorch that saves activations on CPU to avoid using too much GPU memory. Please feel free to open us a feature request at Issues · pytorch/pytorch · GitHub.
If you do want to go down the route of manually implementing this by yourself, one (very tedious) way is to