Updating two models at the same time

Using retain_graph=True is often used as a workaround which is unfortunately usually wrong.
I don’t know how the outputs are calculated, but I guess you might be running into this error, which tries to use stale forward activations while calculating the gradients.