Combination losses: two backward or one backward?

John1231983 · February 21, 2019, 6:53pm

Hello all, I have an architecture likes

Untitled%20Diagram%20(2)

The input is fed to Gen network to generate a fake image (fakeA). I use L1 loss to compute the different between input and fakeA. I called it is lossGen.

The fakeA then is fed to the segmentation network to create a predA. I used cross entropy to compute the loss between the label and predA. I called it is lossSeg

For training, I have two way

lossGen=...
lossSeg=...
loss = lossGen + lossSeg
loss.backward()

Or

lossGen=...
lossGen.backward()
lossSeg =...
lossSeg.backward()

Which way should I use? Note that, the output of the GenA is used as input for Seg network. Thanks

ptrblck · February 21, 2019, 8:18pm

Both approaches should compute the same gradients.
In the second approach you would need to call lossGen.backward(retain_graph=True), otherwise the intermediate values will be cleared and you’ll get an error calling lossSeg.backward().

However, currently you are using lossSeg to calculate gradients in both models, GenA and SegA.
Is this what you would like to do?
If you would only want to calculate the gradients of lossSeg w.r.t. the parameters in SegA, you should .detach() the output of GenA before passing it so SegA.

John1231983 · February 21, 2019, 9:35pm

Thanks ptrblck. Yes. I only want to use grad of lossSeg to compute the parameter of SegA. So, as your suggestion, my code will be

fake_A= GenA(real_A)
lossGen=L1(real_A, fake_A)
lossGen.backward()

# Feed fake_A to SegA
pred_A= SegA(fake_A.detach())
lossSeg = cross_entropy(pred_A, label)
lossSeg.backward()

Is it correct?

ptrblck · February 21, 2019, 9:41pm

I assume you want to pass pred_A to cross_entropy.
Besides that it looks good.

John1231983 · February 21, 2019, 9:46pm

For other people want to look at my issue, let implement the first point of ptrblck

currently you are using lossSeg to calculate gradients in both models, GenA and SegA

So the code will be

fake_A= GenA(real_A)
lossGen=L1(real_A, fake_A)
# Feed fake_A to SegA
pred_A= SegA(fake_A.detach())
lossSeg = cross_entropy(pred_A, label)
loss = lossSeg + lossGen
loss.backward()

Is it correct?

ptrblck · February 21, 2019, 9:48pm

I would just call backward on both losses separately, since they are independent now.

John1231983 · February 21, 2019, 9:50pm

Sorry. I was missing one point. If we use seg loss to update generator loss. So the code should be (deleted detach()).

fake_A= GenA(real_A)
lossGen=L1(real_A, fake_A)
# Feed fake_A to SegA
pred_A= SegA(fake_A)
lossSeg = cross_entropy(pred_A, label)
loss = lossSeg + lossGen
loss.backward()

isalirezag · February 21, 2019, 9:53pm

Does it really make any differences?
what is the difference if we call it on each loss separately
(i.e. lossGen.backward() and lossSeg.backward()``) comparing to the case that we just doloss.backward()```

ptrblck · February 21, 2019, 9:59pm

No, it won’t make any difference and it might be just my coding style, but I would prefer to handle both losses separately, if they are independent from each other.
Otherwise I would try to figure out, why the author of the code is summing them before calling .backward().

John1231983 · February 21, 2019, 10:02pm

Because I read this paper, they mentioned the total loss is combination loss (Eq.6)

And I guess it should be combined in one loss_total (loss) and call backward one time. Am I right?

isalirezag · February 21, 2019, 10:08pm

as far as i understand in papers people put all the losses together.
however, in implementation the do it separately.
Similar for the original gan that they put all the losses as a min max loss but in training they do backward first for discriminator and update the weights in discriminator and then do it for generator and update the weights in generator.

also based on my experience it does not really matter if you do backward for each separately comparing to the case that you sum them and then do backward on all of them.