Suppose my loss is computed with a variable which was the output of my model, i.e.
loss = A (will constantly change in the subsequent iterations depending on the gradient) + B (was an output of the model, but is fixed after that)
How do i call loss.backward()?
I keep getting
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
The problem seems to be that I can’t let B be fixed, since it was the output of the model and it somehow thought it needs some buffers of B.
I do not want to retain any buffer (there shouldn’t be any reason I need to, should there? B is supposed to be fixed throughout, just passing through the model once), I just want a fresh iteration of backpropagation with A being updated based on the gradient and B fixed.
As far as I understand, you should at least retain the graph of B since pytorch saves intermediate tensors in that graph, not in components of nn module.
If your loss is really constructed with form of A + B, you can call A.backward() and B.backward(retain_graph=True) separately.
feature_target = model(input1)
feature_identity = model(input2)
for i, data in enumerate(loader):
feature_adversarial = model(data)
distance_target = torch.sum((feature_adversarial - feature_target.clone()) **2)
distance_identity = torch.sum((feature_adversarial - feature_identity.clone()) ** 2)
loss = distance_target + 2*distance_identity
loss.backward()
In that case you should call loss.backward(retain_graph=True) instead of loss.backward() to retain saved intermediate tensors those are necessary to compute gradients. Otherwise you would lose the information for gradient computation.
If it’s not the case, please tell me what do you mean to say but they are fixed thereafter. in detail. Does it mean you want feature_target and feature_identity don’t affect on model update?
Yup, just like the code you posted. So feature_target and feature_identity are fixed in the loop.
Just to understand more about retain_graph, why should I retain it? Which information from (i-1)-th iteration does i-th iteration needs. As far as I know, there should be none, feature_target and feature_identity should be treated as constants. What am I missing here?
feature_target = model(input1)
feature_identity = model(input2)
feature_adversarial = model(data)
for i in range(100):
distance_target = torch.sum((feature_adversarial - feature_target.clone()) **2)
distance_identity = torch.sum((feature_adversarial - feature_identity.clone()) ** 2)
loss = distance_target + 2*distance_identity
loss.backward()
feature_adversarial = update(feature_adversarial)
model(feature_adversarial)
Basically retain_graph option retains the intermediate tensors created while construction of computation graph. Without this, backward function would destroy all the connected computation graph.
Picture in above question could be helpful.
In this case, what we want to retain is the intermediate tensors created from
part, which should be used to compute gradients later.
If you call loss.backward() without retain_graph option, you will lose information about not only feature_adversarial of i-th iteration, but also feature_target and feature_identity since loss computation graph contains computation graphs of them as well.
Of course information of feature_adversarial would be retained if you use loss.backward(retain_graph=True), which is wasteful.
But in my personal opinion, maybe pytorch will destroy computation graph of feature_adversarial of i-th iteration when new feature_adversarial is constructed on i+1 th iteration.