PyTorch 1.5.0, RuntimeError when loading state dict

Hi!

I have something like the following in a loss function (this is code from an exercise, so there’s no direct practical value):

old_state = copy.deepcopy(model.state_dict())
logits = model(x)
loss = objective(logits, y)
model.load_state_dict(old_state)
return loss

This leads to an error in the backward step of the loss:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [300, 10]], which is output 0 of TBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

The anomaly detection isn’t helpful in this case. Commenting out the model.load_state_dict statement removes the error message.

Any ideas on how I could save and load the state_dict without running into an error?

Hi,

The loading of the state dict is not tracked as a differentiable op by the autograd.
But it does modify the existing parameters inplace with the new value. But these values are required to compute the backward pass (as they were used during the forward) and thus the error you see.

Why do you need to do this? Can you give a bit more context on what you’re trying to achieve here?

As I wrote it‘s an exercise and we have to do some second order optimization of the learning rate by doing a backtracking using the loss function (Armijo condition, line search). While backtracking the loss has to be computed from different weights, so the network itself changes, therefore the state_dict is saved and the original weights are reloaded again (that‘s my interpretation, I‘m not sure as I‘m not the author of the code).

Is it in some way possible to disable PyTorch‘s checks for integrity? I know this doesn‘t make sense usually, but in this case it would be easier, I mean.

Or is there a PyTorch version that does no checks? I think there must be one, other than that this exercise would never be executable? Or I misunderstood something in the code, which might even be possible to some degree.

I can‘t provide more code as there‘s a strict copyright on it, and I do not want to run into legal issues.

I know this doesn‘t make sense usually, but in this case it would be easier, I mean.

Well if you disable these checks, all the gradients the gradient computes in this case will be wrong. So that won’t really help you.

You might want to modify the code to move the load_state_dict() after the backward to make sure that you don’t override values that are required.
Otherwise, you can change your model’s forward to .clone() the parameters before using them so that the value modified inplace by the load_state_dict() is not the one that is needed to compute the backward.

Hi @albanD,

a solution to the problem would simply be to make a copy of the whole model, not only of the state_dict and operate on the copied model:

model_copy = copy.deepcopy(model)

        logits = model_copy(x)
        return objective(logits, y)

To be honest, this was not my idea.
Anyway, thanks for your help!

Jürgen