Two networks: one of the variables needed for gradient computation has been modified by an inplace operation:

network1 alone works fine. Since, no copying/loading of parameters needs to be done.

Minimal Code:


#Training Loop
network1.optimizer.zero_grad()
network2.optimizer.zero_grad()

#network2 loads all the weights and bias from network1
#require_grad = False for all network2 parameters except two matrices
network2.load_state_dict(network1.state_dict(), strict=False)        

#training _data
states, actions, rewards, states_, terminated, truncated 

q_pred = network1.forward(states)
s_pred = network2.forward(states)

q_target = target

loss_dqn = network1.loss(q_target, q_pred)
loss_dqn.backward()
network1.optimizer.step()


loss_sym = network2.loss(q_pred, s_pred)
loss_sym.backward()
network2.optimizer.step()