I was training CycleGAN.
When I train discriminator,
loss_D = loss_D_A + loss_D_B
Without thinking of retain_graph=True stuff, assuming this code doesn’t occur any error, are both case have same effects?
My question is that for case 2., does loss get accumulated twice (not just loss A. loss A + loss B for Discriminator A. Same case for Discriminator B.) so it update the gradient twice more then they should have been? Or does loss automatically affects only to its related tensors even though they are added like case 2.?
This will work properly.
loss_D_A.backward() accumulates the gradient
loss_D_A into whatever parameters
loss_D_A depends on.
loss_D_B.backward() then accumulates the gradient of
into its relevant parameters.
loss_D_B both depend on some of the same
parameters, those parameters will have the sum of the two gradients
accumulated into their
But gradients are linear – that is, the gradient of the sum is the sum of
the gradients. So accumulating the gradient of
loss_D_A + loss_D_B
into the relevant parameters does, indeed, accumulate the sum of the
two gradients into the parameters. (But by doing so in one
call, it will be somewhat cheaper.)
Just to be clear, in case 1, if
part of the same computation graph, you will have to use
loss_D_A.backward (retain_graph = True) in order for
loss_D_B.backward() to work.
retain_graph = True won’t be needed for case 2 because there
is only one
Thank you for your fast and kind reply!
So do you mean that also for case 2, it will accumulate the gradient of
loss_D_A only into its relevant parameters and accumulate the gradient of
loss_D_B only into its relevant parameters even if we call only one
.backward() to sum of those two losses (
loss_D_A + loss_D_B) but not separate
.backward() call to each losses?
Yes, this is correct (with the proviso that you might need to use
retain_graph = True in case 1).
You can easily test that cases 1 and 2 produce the same gradients.
Run case 1 and
.grads of all of the parameters that affect
loss_D_B for future comparison. Then repeat the
loss computations from scratch (using the same data, of course) and
run case 2. You can now compare the
.grads from case 2 with those
you saved from case 1 and you will see that they are equal (up to numerical
Thank you so much.
So kind and clear solution!
Have a nice day and always good luck Frank!