I have my loss function as loss = loss1 + loss2.
After each forward pass, i calculate the gradients using loss.backward() and update my weights. Where loss1(w,b) loss2(w,b) and loss(w,b) are functions of network parameters.
Now in every iteration of gradient descent, I need the gradients of loss1 and loss2 wrt the network parameters as well.So i use loss1.backward(retain_graph=True) and loss2.backward(retain_graph=True) . Is it the right approach?
Also, if you can explain what is actually happening while doing retain_graph= True and with False I would be grateful
retain_graph=True causes autograd NOT to aggressively free up the saved tensors required for grad computation after the backward call.
But for this, you will not be able to call . backward on a tensor more than once as the intermediate tensors required for the backward pass shall already be freed.
For your case, using
retain_graph=True should help if you aren’t running into any errors and everything is working as expected. Otherwise, feel free to post the error along with an executable code snippet.
Thanks for your answer,
I do not get any error, but as you mentioned : “i cannot call .backward more than once …”, That explains the error that I was recieving when I was not setting the retain_graph = True, parameter while calculating the gradients of loss1 and loss2 in each step of my gradient descent as they use the same tensors for computation.
Again, many thanks for clarification
A small addition: make sure to use
retain_graph=False (or just drop this argument as it’s the default) in the last
backward call to allow PyTorch to free the intermediates.