Autograd with 2 branches

Using retain_graph=True is usually wrong and is often added as a workaround for another issue.
Based on your code snippet, I guess you might be running into the issue when backward is called the second time on the computation graph from the first iteration (which was stale forward activations by now as the parameters were updated) and the current second iteration.
Could you explain, why you’ve used this argument and if it’s really needed?