I am training two networks at a time.
network 1: 3 Linear layers; fc1, fc2, fc3
network2: 3 Linear Layers + 2 additional matrices.

Requires_grad = True for all the parameters of network 1
network 2 copies all the parameters from network 1 using load_state_dict(…, strict=False), all the parameters(weights and bias) of network 2 don’t require grad. Only 2 additional matrices requires Grad.

While training:

As soon as I update network 1, or take a backpropagation step, version of network1.fc3.weight.version=2

I make a copy of this parameters at network 2. version remains 2
When I take a backpropagation step in network 2, its version is making computations the error.
"is at version 2; expected version 1 "

Are you seeing the same error also if network1 is trained alone without the usage of network2?
If not, could you post a minimal code snippet showing your training approach?

I get this runtime error:
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad().

and upon updating retains_graph=True, throws an:
one of the variables needed for gradient computation has been modified by an inplace operation error.

As soon as I run, network2.load_state_dict(network1.state_dict(), strict=False)
All the weight and bias parameter’s version change from 1 to 2 while the version of network1 parameters remains at 1.
This version 2 conflicts with the loss_sym.backward()

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed).

since you are trying to call backward twice using q_pred.
Using loss_dqn.backward(retain_graph=True) fixes this error but will raise the expected:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation...

error since you are using stale forward activations in network1. This is caused by the optimizer1.step() call which will update the parameters and will thus make the stored intermediate forward activations created in q_pred = network1(x) stale.
To fix this you can move the opimizer1.step() call down which will work.
Note that this behavior is not changes if the load_state_dict is removed or kept as it’s just caused by an invalid parameter update.