Understanding backward of Variables for complex operations

zzz · May 31, 2017, 3:16am

Hi

My real network is a bit complicated. Let me use an toy example.
For example I have three networks, NetA, NetB, NetC, with three criterions CritA, CritB, CritC and an input image I. I am doing as follows.

oA = NetA(I)
oB = NetB(oA)
oC = NetC(oA)

lossAB = CritA(oA, targetA) + Crit(oB, targetB)
lossAB.backward() # first backward

lossC = CritC(oC, targetC)
lossC.backward() # second backward where error is raised.

When I am doing so, I will raise an error that RuntimeError: Trying to backward through the graph second time, but the buffers have already been freed. Please specify retain_variables=True when calling backward for the first time.

I believe it is because in the first backward variable oA has performed backward once. So when the second backward is called, oA makes issue.

How to solve this kind of problem? Suppose we do not use this way,

loss_overall = lossAB + lossC
loss_overall.backward()

because my real situation is more complicated and I do need to separate the backward calls.

smth · May 31, 2017, 3:22am

torch.autograd.backward([lossAB, lossC], [gradAB, gradC])

You can use this too.
http://pytorch.org/docs/autograd.html#torch.autograd.backward

zzz · May 31, 2017, 3:38am

Thanks for your prompt reply, Soumith!
I mean what if the two are not called simultaneously, maybe in two functions.
Is it still possible to do that?
Would lossAB.backward(retain_variables=True) helps? However if it is, it is not a good option since it retains all variables in netA and netB.

smth · May 31, 2017, 1:32pm

yes you can retain_variables (and that’s your only other option, if you want to call two backward in two separate functions), but it holds onto variables built in netA, netB