What caused this in-place modification error?

JimW · August 2, 2024, 1:03am

I am having this in-place modification error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [128, 1]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

So I went back to my code and narrowed down the cause to this piece of codes:

#my_layer = nn.Sequential(torch.nn.Linear(200, 1), torch.nn.Sigmoid())
my_layer = nn.Sequential(torch.nn.Linear(200, 128), torch.nn.Linear(128, 1), torch.nn.Sigmoid())
my_layer.double()
my_layer.train()
my_optim = torch.optim.Adam(
my_layer.parameters(),
lr=0.001,
betas=(0.9, 0.999),
weight_decay=1e-4)

The error disappeared when I used only 1 single linear layer in my network. But if I invert an additional linear layer, the error came back.

Could someone please provide some guidance what is the root cause of such errors? Thanks.

AlphaBetaGamma96 · August 2, 2024, 11:31am

Perhaps calling .double() counts as an in-place operation? Also run your code with a torch.autograd.set_detect_anomaly context manager, docs here: Automatic differentiation package - torch.autograd — PyTorch 2.4 documentation

JimW · August 2, 2024, 6:22pm

Thank you for your response.
I converted all the doubles to back to float, and removed the my_layer.double().
The error is still there:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [128, 1]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Fatal Python error: gilstate_tss_set: failed to set current tstate (TSS)

I added with torch.autograd.set_detect_anomaly(True): as you suggested.

KFrank · August 4, 2024, 2:19am

Hi Jim!

It is most likely the case that the .weight of your Linear (128, 1)
is being inappropriately modified inplace. You are probably doing
something like:

some_loss.backward (retain_graph = True)
my_optim.step()        # counts as inplace modifications of my_layer.parameters()
...
some_loss.backward()   # triggers error

You could verify this by printing out my_layer[1].weight._version
before and after you call my_optim.step() and see if ._version
changes from 1 to 2.

One fix could be pytorch’s “allow mutation” context manager. You could
also call torch.nn.functional.linear() with clones of the Linear (128, 1)'s
.weight and .bias parameters.

With your two-Linear Sequential, pytorch needs the unmodified
.weight of the second Linear to compute gradients with respect to
the first Linear. When you use just a one-Linear Sequential, you
are still modifying that Linear’s parameters inplace, but those
parameters are not needed in order to compute any gradients, so
the inplace-modification is not an error.

As an aside, the two Linears in a row in your Sequential (without
any intervening non-linear “activation”) collapse, in effect, into a
single Linear (200, 1). Therefore, you would probably want to
add something like a ReLU() or Sigmoid() in between them.

Best.

K. Frank