How to free-up model activations

I am trying a distillation-like setup where predictions from one model (say model X) are used as target for another model (say model Y). Since model X is pre-trained, I compute the logits under torch.eval context and train model Y with it.

In this case, would Pytorch maintain the computation graph (and intermediate activations) corresponding to model X or free up the computational graph as soon as the forward pass for model X is complete? Is there a way to free the memory held by computational graph for model X?

If you don’t want any autograd related operations to happen while you forward your model X, you can run the forward in a no_grad block:

with torch.no_grad():
   out = model_X(inp)

Thanks for the prompt reply. I am wondering how to avoid storing the intermediate activations (for model X) and not just the .grad attribute

Running in no grad mode will disable all the autograd. So no graph will be created and no intermediate activation will be saved.

1 Like

Perfect! I will try it out and if everything works, I will come back to mark it as the accepted answer :slight_smile: