Retain_graph = True out of memory issue

I’ve tried to train intermediate layers independently but when I run my code I’ve got out of memory issue.

I would like to know retain_graph = True is the problem? or not and what I have to do to solve this problem.

I am using hugging face BERT model and this is my code

Thank you.


t_outputs = t_model(**intputs)
s_outputs = s_model(**inputs)
encoder_layers = args.encoder_layers

loss = torch.nn.KLDivLoss(reduction='batchmean')
batch, row, col = s_outputs[3][1].size()

for i, k in enumerate(encoder_layers):
    output = loss(F.log_softmax(s_outputs[3][i+1], dim=2).view(batch*row, 1, col), F.softmax(t_result[3][k+1],dim=2).view(batch*row, 1, col))
   # to freeze layers
    for name, p in s_model.named_parameters():
        if "layer." + str(i-1) in name:
        p.requires_grad = False
        output.backward(retain_graph=True)

retain_graph=True might be the cause of the OOM issue, as it will force Autograd to keep the computation graph alive (and will thus use more memory than in the default use case).

To clear the intermediate tensors, the last backward call should use retain_graph=False.

Note that you are currently accumulating the gradients, as you are calling output.backward() for each parameter, while setting the requires_grad attribute of a single parameter to False.