Understanding backward of Variables for complex operations


My real network is a bit complicated. Let me use an toy example.
For example I have three networks, NetA, NetB, NetC, with three criterions CritA, CritB, CritC and an input image I. I am doing as follows.

oA = NetA(I)
oB = NetB(oA)
oC = NetC(oA)

lossAB = CritA(oA, targetA) + Crit(oB, targetB)
lossAB.backward() # first backward

lossC = CritC(oC, targetC)
lossC.backward() # second backward where error is raised.

When I am doing so, I will raise an error that RuntimeError: Trying to backward through the graph second time, but the buffers have already been freed. Please specify retain_variables=True when calling backward for the first time.

I believe it is because in the first backward variable oA has performed backward once. So when the second backward is called, oA makes issue.

How to solve this kind of problem? Suppose we do not use this way,

loss_overall = lossAB + lossC

because my real situation is more complicated and I do need to separate the backward calls.

torch.autograd.backward([lossAB, lossC], [gradAB, gradC])

You can use this too.

Thanks for your prompt reply, Soumith!
I mean what if the two are not called simultaneously, maybe in two functions.
Is it still possible to do that?
Would lossAB.backward(retain_variables=True) helps? However if it is, it is not a good option since it retains all variables in netA and netB.

yes you can retain_variables (and that’s your only other option, if you want to call two backward in two separate functions), but it holds onto variables built in netA, netB

1 Like