it looks like this usually occurs because some tensor is getting stored and used for the backward pass again
Yes, if you are backwarding multiple times over the same graph, and use the same
However, I have looked up this behavior and confirmed that I do not want to do this.
It sounds like want to make sure that the two backwards operate on two disjoint graphs?
How can I track down the tensor that is causing problems?
You can use TORCH_LOGS=“+autograd” to log the backward pass, or enable
Automatic differentiation package - torch.autograd — PyTorch 2.5 documentation, which can give you a stack trace of where in the forward the error from backward corresponds to.