How to free-up model activations

shagunsodhani · May 21, 2020, 7:52pm

I am trying a distillation-like setup where predictions from one model (say model X) are used as target for another model (say model Y). Since model X is pre-trained, I compute the logits under torch.eval context and train model Y with it.

In this case, would Pytorch maintain the computation graph (and intermediate activations) corresponding to model X or free up the computational graph as soon as the forward pass for model X is complete? Is there a way to free the memory held by computational graph for model X?

albanD · May 21, 2020, 8:01pm

If you don’t want any autograd related operations to happen while you forward your model X, you can run the forward in a no_grad block:

with torch.no_grad():
   out = model_X(inp)

shagunsodhani · May 21, 2020, 8:09pm

Thanks for the prompt reply. I am wondering how to avoid storing the intermediate activations (for model X) and not just the .grad attribute

albanD · May 21, 2020, 8:16pm

Running in no grad mode will disable all the autograd. So no graph will be created and no intermediate activation will be saved.

shagunsodhani · May 21, 2020, 10:13pm

Perfect! I will try it out and if everything works, I will come back to mark it as the accepted answer