Transformer model training

hi ,
I try to train a Transformer model. Code line
generate the runtime error.

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2048, 768]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True)

Thank you.

This error is most likely raised due to the usage of retain_graph=True. Could you explain why you are using it?
If it was used as a workaround to mask another error, try to fix the original error first.

Thank you very much for your comments. I will do it.