What caused this in-place modification error?

I am having this in-place modification error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [128, 1]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

So I went back to my code and narrowed down the cause to this piece of codes:

#my_layer = nn.Sequential(torch.nn.Linear(200, 1), torch.nn.Sigmoid())
my_layer = nn.Sequential(torch.nn.Linear(200, 128), torch.nn.Linear(128, 1), torch.nn.Sigmoid())
my_layer.double()
my_layer.train()
my_optim = torch.optim.Adam(
my_layer.parameters(),
lr=0.001,
betas=(0.9, 0.999),
weight_decay=1e-4)

The error disappeared when I used only 1 single linear layer in my network. But if I invert an additional linear layer, the error came back.

Could someone please provide some guidance what is the root cause of such errors? Thanks.

Perhaps calling .double() counts as an in-place operation? Also run your code with a torch.autograd.set_detect_anomaly context manager, docs here: Automatic differentiation package - torch.autograd — PyTorch 2.4 documentation

Thank you for your response.
I converted all the doubles to back to float, and removed the my_layer.double().
The error is still there:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [128, 1]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Fatal Python error: gilstate_tss_set: failed to set current tstate (TSS)

I added with torch.autograd.set_detect_anomaly(True): as you suggested.

Hi Jim!

It is most likely the case that the .weight of your Linear (128, 1)
is being inappropriately modified inplace. You are probably doing
something like:

some_loss.backward (retain_graph = True)
my_optim.step()        # counts as inplace modifications of my_layer.parameters()
...
some_loss.backward()   # triggers error

You could verify this by printing out my_layer[1].weight._version
before and after you call my_optim.step() and see if ._version
changes from 1 to 2.

One fix could be pytorch’s “allow mutation” context manager. You could
also call torch.nn.functional.linear() with clones of the Linear (128, 1)'s
.weight and .bias parameters.

With your two-Linear Sequential, pytorch needs the unmodified
.weight of the second Linear to compute gradients with respect to
the first Linear. When you use just a one-Linear Sequential, you
are still modifying that Linear’s parameters inplace, but those
parameters are not needed in order to compute any gradients, so
the inplace-modification is not an error.

As an aside, the two Linears in a row in your Sequential (without
any intervening non-linear “activation”) collapse, in effect, into a
single Linear (200, 1). Therefore, you would probably want to
add something like a ReLU() or Sigmoid() in between them.

Best.

K. Frank