How to call loss.backward() the second time with buffer freed (not retaining buffer)?

Suppose my loss is computed with a variable which was the output of my model, i.e.

loss = A (will constantly change in the subsequent iterations depending on the gradient) + B (was an output of the model, but is fixed after that)

How do i call loss.backward()?

I keep getting

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

The problem seems to be that I can’t let B be fixed, since it was the output of the model and it somehow thought it needs some buffers of B.

I do not want to retain any buffer (there shouldn’t be any reason I need to, should there? B is supposed to be fixed throughout, just passing through the model once), I just want a fresh iteration of backpropagation with A being updated based on the gradient and B fixed.

As far as I understand, you should at least retain the graph of B since pytorch saves intermediate tensors in that graph, not in components of nn module.
If your loss is really constructed with form of A + B, you can call A.backward() and B.backward(retain_graph=True) separately.

Ermm, I can’t possible call A.backward(), it is not a scalar. To be exact, my loss calculation is as such

distance_target = torch.sum((feature_adversarial - feature_target.clone()) **2) distance_identity = torch.sum((feature_adversarial - feature_identity.clone()) ** 2) loss = distance_target + 2*distance_identity loss.backward()

features are vector of size (1, 512).

feature_target and feature_identity are outputs of my model as well, but they are fixed thereafter.

The only part that changes is feature_adversarial.

Are you suggesting that I instantiate two separate instance of the same models?

So your code has form like this?

feature_target = model(input1)
feature_identity = model(input2)
for i, data in enumerate(loader):
    feature_adversarial = model(data)
    distance_target = torch.sum((feature_adversarial - feature_target.clone()) **2)
    distance_identity = torch.sum((feature_adversarial - feature_identity.clone()) ** 2)
    loss = distance_target + 2*distance_identity
    loss.backward()

In that case you should call loss.backward(retain_graph=True) instead of loss.backward() to retain saved intermediate tensors those are necessary to compute gradients. Otherwise you would lose the information for gradient computation.

If it’s not the case, please tell me what do you mean to say but they are fixed thereafter. in detail. Does it mean you want feature_target and feature_identity don’t affect on model update?

Yup, just like the code you posted. So feature_target and feature_identity are fixed in the loop.

Just to understand more about retain_graph, why should I retain it? Which information from (i-1)-th iteration does i-th iteration needs. As far as I know, there should be none, feature_target and feature_identity should be treated as constants. What am I missing here?

feature_target = model(input1)
feature_identity = model(input2)
feature_adversarial = model(data)
for i in range(100):
    distance_target = torch.sum((feature_adversarial - feature_target.clone()) **2)
    distance_identity = torch.sum((feature_adversarial - feature_identity.clone()) ** 2)
    loss = distance_target + 2*distance_identity
    loss.backward()
    feature_adversarial = update(feature_adversarial)
    model(feature_adversarial)

My code is something more like this.

Basically retain_graph option retains the intermediate tensors created while construction of computation graph. Without this, backward function would destroy all the connected computation graph.


Picture in above question could be helpful.

In this case, what we want to retain is the intermediate tensors created from

feature_target = model(input1)
feature_identity = model(input2)

part, which should be used to compute gradients later.

If you call loss.backward() without retain_graph option, you will lose information about not only feature_adversarial of i-th iteration, but also feature_target and feature_identity since loss computation graph contains computation graphs of them as well.

Of course information of feature_adversarial would be retained if you use loss.backward(retain_graph=True), which is wasteful.
But in my personal opinion, maybe pytorch will destroy computation graph of feature_adversarial of i-th iteration when new feature_adversarial is constructed on i+1 th iteration.

I dont quite agree with you that we need any intermediate tensors created from

feature_target = model(input1)
feature_identity = model(input2)

The only tensors required for the loss computation are the feature_target and feature_identity, not their intermediate tensors.

I will use retain graph for now but it doesn’t feel right to me. Hopefully someone from the dev team can help to explain the best way to do this. :slight_smile: