Combination losses: two backward or one backward?

Hello all, I have an architecture likes

Untitled%20Diagram%20(2)

The input is fed to Gen network to generate a fake image (fakeA). I use L1 loss to compute the different between input and fakeA. I called it is lossGen.

The fakeA then is fed to the segmentation network to create a predA. I used cross entropy to compute the loss between the label and predA. I called it is lossSeg

For training, I have two way

lossGen=...
lossSeg=...
loss = lossGen + lossSeg
loss.backward()

Or

lossGen=...
lossGen.backward()
lossSeg =...
lossSeg.backward()

Which way should I use? Note that, the output of the GenA is used as input for Seg network. Thanks

Both approaches should compute the same gradients.
In the second approach you would need to call lossGen.backward(retain_graph=True), otherwise the intermediate values will be cleared and you’ll get an error calling lossSeg.backward().

However, currently you are using lossSeg to calculate gradients in both models, GenA and SegA.
Is this what you would like to do?
If you would only want to calculate the gradients of lossSeg w.r.t. the parameters in SegA, you should .detach() the output of GenA before passing it so SegA.

3 Likes

Thanks ptrblck. Yes. I only want to use grad of lossSeg to compute the parameter of SegA. So, as your suggestion, my code will be

fake_A= GenA(real_A)
lossGen=L1(real_A, fake_A)
lossGen.backward()

# Feed fake_A to SegA
pred_A= SegA(fake_A.detach())
lossSeg = cross_entropy(pred_A, label)
lossSeg.backward()

Is it correct?

I assume you want to pass pred_A to cross_entropy.
Besides that it looks good.

For other people want to look at my issue, let implement the first point of ptrblck

currently you are using lossSeg to calculate gradients in both models, GenA and SegA

So the code will be

fake_A= GenA(real_A)
lossGen=L1(real_A, fake_A)
# Feed fake_A to SegA
pred_A= SegA(fake_A.detach())
lossSeg = cross_entropy(pred_A, label)
loss = lossSeg + lossGen
loss.backward()

Is it correct?

I would just call backward on both losses separately, since they are independent now.

Sorry. I was missing one point. If we use seg loss to update generator loss. So the code should be (deleted detach()).

fake_A= GenA(real_A)
lossGen=L1(real_A, fake_A)
# Feed fake_A to SegA
pred_A= SegA(fake_A)
lossSeg = cross_entropy(pred_A, label)
loss = lossSeg + lossGen
loss.backward()

Does it really make any differences?
what is the difference if we call it on each loss separately
(i.e. lossGen.backward() and lossSeg.backward()``) comparing to the case that we just doloss.backward()```

No, it won’t make any difference and it might be just my coding style, but I would prefer to handle both losses separately, if they are independent from each other.
Otherwise I would try to figure out, why the author of the code is summing them before calling .backward(). :wink:

1 Like

Because I read this paper, they mentioned the total loss is combination loss (Eq.6)

And I guess it should be combined in one loss_total (loss) and call backward one time. Am I right?

as far as i understand in papers people put all the losses together.
however, in implementation the do it separately.
Similar for the original gan that they put all the losses as a min max loss but in training they do backward first for discriminator and update the weights in discriminator and then do it for generator and update the weights in generator.

also based on my experience it does not really matter if you do backward for each separately comparing to the case that you sum them and then do backward on all of them.

1 Like