RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [128, 1]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

So I went back to my code and narrowed down the cause to this piece of codes:

Thank you for your response.
I converted all the doubles to back to float, and removed the my_layer.double().
The error is still there:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [128, 1]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Fatal Python error: gilstate_tss_set: failed to set current tstate (TSS)

I added with torch.autograd.set_detect_anomaly(True): as you suggested.

With your two-LinearSequential, pytorch needs the unmodified .weight of the second Linear to compute gradients with respect to
the first Linear. When you use just a one-LinearSequential, you
are still modifying that Linear’s parameters inplace, but those
parameters are not needed in order to compute any gradients, so
the inplace-modification is not an error.

As an aside, the two Linears in a row in your Sequential (without
any intervening non-linear “activation”) collapse, in effect, into a
single Linear (200, 1). Therefore, you would probably want to
add something like a ReLU() or Sigmoid() in between them.