Two networks: one of the variables needed for gradient computation has been modified by an inplace operation:

I am training two networks at a time.
network 1: 3 Linear layers; fc1, fc2, fc3
network2: 3 Linear Layers + 2 additional matrices.

Requires_grad = True for all the parameters of network 1
network 2 copies all the parameters from network 1 using load_state_dict(…, strict=False), all the parameters(weights and bias) of network 2 don’t require grad. Only 2 additional matrices requires Grad.

While training:

  1. As soon as I update network 1, or take a backpropagation step, version of network1.fc3.weight.version=2
  2. I make a copy of this parameters at network 2. version remains 2
    When I take a backpropagation step in network 2, its version is making computations the error.
    "is at version 2; expected version 1 "

Are you seeing the same error also if network1 is trained alone without the usage of network2?
If not, could you post a minimal code snippet showing your training approach?

network1 alone works fine. Since, no copying/loading of parameters needs to be done.

Minimal Code:


#Training Loop
network1.optimizer.zero_grad()
network2.optimizer.zero_grad()

#network2 loads all the weights and bias from network1
#require_grad = False for all network2 parameters except two matrices
network2.load_state_dict(network1.state_dict(), strict=False)        

#training _data
states, actions, rewards, states_, terminated, truncated 

q_pred = network1.forward(states)
s_pred = network2.forward(states)

q_target = target

loss_dqn = network1.loss(q_target, q_pred)
loss_dqn.backward()
network1.optimizer.step()


loss_sym = network2.loss(q_pred, s_pred)
loss_sym.backward()
network2.optimizer.step()

I get this runtime error:
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad().

and upon updating retains_graph=True, throws an:
one of the variables needed for gradient computation has been modified by an inplace operation error.

As soon as I run,
network2.load_state_dict(network1.state_dict(), strict=False)
All the weight and bias parameter’s version change from 1 to 2 while the version of network1 parameters remains at 1.
This version 2 conflicts with the
loss_sym.backward()

I tried to create an executable code snippet using your pseudo-code but this works for me:

network1 = models.resnet18()
network2 = models.resnet18()

optimizer1 = torch.optim.Adam(network1.parameters())
optimizer2 = torch.optim.Adam(network2.parameters())

for _ in range(10):
    #Training Loop
    optimizer1.zero_grad()
    optimizer2.zero_grad()
    
    #network2 loads all the weights and bias from network1
    #require_grad = False for all network2 parameters except two matrices
    network2.load_state_dict(network1.state_dict(), strict=True) 
    
    x = torch.randn(1, 3, 224, 224)
    
    q_pred = network1(x)
    s_pred = network2(x)
    
    loss_dqn = q_pred.mean()
    loss_dqn.backward()
    optimizer1.step()
    
    loss_sym = s_pred.mean()
    loss_sym.backward()
    optimizer2.step()

Could you check what the missing differences are?

Hi @ptrblck
Thank you for the prompt help.
If you make dependence of one output with the second, I can reproduce the error.

network1 = models.resnet18()
network2 = models.resnet18()

optimizer1 = torch.optim.Adam(network1.parameters())
optimizer2 = torch.optim.Adam(network2.parameters())

loss1 = nn.MSELoss()
loss2 = nn.MSELoss()

for _ in range(10):
    #Training Loop
    optimizer1.zero_grad()
    optimizer2.zero_grad()
    
    #network2 loads all the weights and bias from network1
    #require_grad = False for all network2 parameters except two matrices
    network2.load_state_dict(network1.state_dict(), strict=True) 
    
    x = torch.randn(1, 3, 224, 224)
    
    q_pred = network1(x)
    s_pred = network2(x)
    q_pred_1 = torch.zeros_like(q_pred)


    loss_dqn = loss1(q_pred, q_pred_1)
    loss_dqn.backward()
    optimizer1.step()
    
    loss_sym = loss2(q_pred, s_pred)
    loss_sym.backward()
    optimizer2.step()

Both are different loss with different networks, yet load_state_dict messes with the version.

Your code will raise this error directly:

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed).

since you are trying to call backward twice using q_pred.
Using loss_dqn.backward(retain_graph=True) fixes this error but will raise the expected:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation...

error since you are using stale forward activations in network1. This is caused by the optimizer1.step() call which will update the parameters and will thus make the stored intermediate forward activations created in q_pred = network1(x) stale.
To fix this you can move the opimizer1.step() call down which will work.
Note that this behavior is not changes if the load_state_dict is removed or kept as it’s just caused by an invalid parameter update.

1 Like

Thank you, a lot. I appreciate your help!
It clears a lot of things.

One additional query:
Why does this work? Shouldn’t call from optimizer2.step() makes q_pred stale?

loss_sym = loss2(q_pred, s_pred)
loss_sym.backward(retain_graph = True)
optimizer2.step()

loss_dqn = loss1(q_pred, q_pred_1)
loss_dqn.backward()
optimizer1.step()
    

No, since q_pred was created by network1 while optimizer2.step() updates the parameters of network2.