How can I compute the gradient w.r.t to weights different from the one used to compute the loss.

To clarify my problem: I have a neural network model and its weights W1. I train my model on some data and so the weights changed and are now W2. Then I compute the loss L using W2 on a new set of data. Now I would like to update my weight W1 using gradient computed w.r.t to W1 on L (which was computed using W2).

I tried to reload the weights W1 using load_state_dict between the compute of the loss and loss.backward() but it does not work. I think that the problem is that load_state_dict change some internal states and variables in addition to the weights.

How can I solve my problem ?

Thank you