RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [28]] is at version 56; expected version 28 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

What does mean version 56 ?
When I try to do like this;

The error is due to self.x = attention. The exact reason depends on the complete code. The main idea is as follows: Let ‘a’ depend on ‘x’ and ‘y’ for its gradient computation. Now when you do self.x = , then it is changing the value of x that a depend on, thus you have no way to compute the gradients for ‘a’.

Those are madeup variables as it not possible to tell the exact problem with partial code. So I was trying to make up some dummy scenario in which case this error would occur.

Doing self.x[0] = something is an in-place operation i.e. no copy of the object is being made. So are modifying the original object but that original object is also needed for backprop computation.