Hi everyone,

I have a Model consisting of 3 parts. Part 1 is an encoder, part 2 and 3 are classifiers, that both get the output of the encoder (part 1) as input. I have two optimizers, the first has the parameters of part 1 and 2 (named optimizer12), the second only has the parameters of part 3 (optimizer3). I am calculating a Loss for the output of part 2 (named loss2) and one for the output of part 3 (named loss3). Now I want to update part 3 and part 1&2 alternatingly. I used the code below in my training method:

```
loss2 = criterion2(output2, target2)
loss3 = criterion3(output3, target3)
loss3.backward(retain_graph=True)
optimizer3.step()
loss12 = loss2 + some_value*loss3
loss12.backward()
optimizer12.step()
```

However, this gives me the following error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2, 2000]] is at version 2; expected version 1 instead.

I think the Variable mentioned in the Error Message is the output of part 3 (at least it hast the same dimensions). Regarding this answer, I assume that the error is caused by the multiplication of `loss3`

with `some_value`

. However, I cannot get rid of this multiplication so I need to find a way around this.

Can you please help me?

Thank you!

I think you might be running into this error, since `optimizer3.step()`

would update the parameters, which could have been used to calculate `loss2`

and `loss12.backward()`

would then try to compute the gradients using stale intermediate forward activations (since the corresponding parameters were already updated as described in the linked post).

Thank you for your fast answer! The optimizer3 only updates the parameters of part3, therefore the new parameters should not affect loss2 (only depends from parameters of part 1 and 2).

If I got it right, in the post you mentioned the solution would be to set `retain_graph=False`

? This won’t work for me, since it results in the following error

RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed).

In this case, I think autograd is trying to backpropagate through part 3 a second time (which would be as I want it to). Even if I calculate the loss3 again after the gradient step of optimizer3 (to get the loss dependent from the new parameters) I am getting this error when calling loss12.backward().

Am I misunderstanding an important concept here or where is my problem?

Thank you!

Thanks for the update and yes, you are right: based on the description `loss3`

should be causing the issue not `loss2`

.

That is unexpected. Could you post an executable code snippet to reproduce this issue?

```
import torch
import torch.nn as nn
import torch.optim as optim
class myModel(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(100,200)
self.fc2 = nn.Linear(200, 2, bias=False)
self.fc3 = nn.Linear(200, 3, bias=False)
def forward(self,x):
h = self.fc1(x)
out2 = self.fc2(h)
out3 = self.fc3(h)
return out2, out3
if __name__ == '__main__':
torch.autograd.set_detect_anomaly(True)
some_value = 0.5
model = myModel()
optimizer12 = optim.Adam(list(model.fc1.parameters()) + list(model.fc2.parameters()), lr=1e-3, weight_decay=1e-4)
optimizer3 = optim.Adam(model.fc3.parameters(), lr=1e-3, weight_decay=1e-4)
criterion2 = nn.CrossEntropyLoss()
criterion3 = nn.CrossEntropyLoss()
input = torch.rand((1,100))
label2 = torch.randint(0,2,(1,))
label3 = torch.randint(0,3,(1,))
output2, output3 = model(input)
loss2 = criterion2(output2, label2)
loss3 = criterion3(output3, label3)
#loss3.backward() # produces RuntimeError: Trying to backward through the graph a second time
loss3.backward(retain_graph=True) # produces: RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
optimizer3.step()
loss3 = criterion3(output3, label3)
loss12 = loss2 + some_value*loss3
loss12.backward()
optimizer12.step()
```

Thanks for the code snippet.

The same root cause can be seen in your code, as `output2`

and `output3`

share `self.fc1`

, which is also the reason why you would need to use `loss3.backward(retain_graph=True)`

.

`optimizer3.step()`

will update the parameters of `fc3`

, which would then compute a wrong gradient during the the backpropagation to `fc1`

.

Thank you very much!

So if I got that right the Solution would be as follows:

```
output2, output3 = model(input)
loss3 = criterion3(output3, label3)
loss3.backward()
optimizer3.step()
optimizer12.zero_grad()
output2, output3 = model(input)
loss2 = criterion2(output2, label2)
loss3 = criterion3(output3, label3)
loss12 = loss2 + some_value*loss3
loss12.backward()
optimizer12.step()
```

Which means one has to forward the input through the network a second time, in order to get the “new” activations for the shared layer.

I am not sure about the `optimizer12.zero_grad()`

. I think before the second forward pass I will have to zero the gradients, at least for fc2, as if I would not do it the the gradients would get summed up for both forward-passes. Is that right?

Again, thanks a lot for your help!

Yes, executing another forward pass should work. Another approach would be to compute the gradients for both losses and use `optimizerX.step()`

afterwards, but it depends on your actual use case, if that’s possible.

Zeroing out the gradients of `optimizer12`

looks valid, but note that the forward pass will not create any gradients, which are computed in the `backward`

call.