RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [181, 128]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detect

You are running into this issue since you are trying to use stale forward activations, which would result in a wrong gradient computation.