# Combination losses: two backward or one backward?

Hello all, I have an architecture likes The input is fed to Gen network to generate a fake image (fakeA). I use L1 loss to compute the different between input and fakeA. I called it is `lossGen`.

The fakeA then is fed to the segmentation network to create a predA. I used cross entropy to compute the loss between the label and predA. I called it is `lossSeg`

For training, I have two way

``````lossGen=...
lossSeg=...
loss = lossGen + lossSeg
loss.backward()
``````

Or

``````lossGen=...
lossGen.backward()
lossSeg =...
lossSeg.backward()
``````

Which way should I use? Note that, the output of the GenA is used as input for Seg network. Thanks

Both approaches should compute the same gradients.
In the second approach you would need to call `lossGen.backward(retain_graph=True)`, otherwise the intermediate values will be cleared and you’ll get an error calling `lossSeg.backward()`.

However, currently you are using `lossSeg` to calculate gradients in both models, `GenA` and `SegA`.
Is this what you would like to do?
If you would only want to calculate the gradients of `lossSeg` w.r.t. the parameters in `SegA`, you should `.detach()` the output of `GenA` before passing it so `SegA`.

3 Likes

Thanks ptrblck. Yes. I only want to use grad of lossSeg to compute the parameter of SegA. So, as your suggestion, my code will be

``````fake_A= GenA(real_A)
lossGen=L1(real_A, fake_A)
lossGen.backward()

# Feed fake_A to SegA
pred_A= SegA(fake_A.detach())
lossSeg = cross_entropy(pred_A, label)
lossSeg.backward()
``````

Is it correct?

I assume you want to pass `pred_A` to `cross_entropy`.
Besides that it looks good.

For other people want to look at my issue, let implement the first point of ptrblck

currently you are using `lossSeg` to calculate gradients in both models, `GenA` and `SegA`

So the code will be

``````fake_A= GenA(real_A)
lossGen=L1(real_A, fake_A)
# Feed fake_A to SegA
pred_A= SegA(fake_A.detach())
lossSeg = cross_entropy(pred_A, label)
loss = lossSeg + lossGen
loss.backward()
``````

Is it correct?

I would just call `backward` on both losses separately, since they are independent now.

Sorry. I was missing one point. If we use seg loss to update generator loss. So the code should be (deleted detach()).

``````fake_A= GenA(real_A)
lossGen=L1(real_A, fake_A)
# Feed fake_A to SegA
pred_A= SegA(fake_A)
lossSeg = cross_entropy(pred_A, label)
loss = lossSeg + lossGen
loss.backward()
``````

Does it really make any differences?
what is the difference if we call it on each loss separately
(i.e. `lossGen.backward()` and `lossSeg.backward()``) comparing to the case that we just do`loss.backward()```

No, it won’t make any difference and it might be just my coding style, but I would prefer to handle both losses separately, if they are independent from each other.
Otherwise I would try to figure out, why the author of the code is summing them before calling `.backward()`. 1 Like

Because I read this paper, they mentioned the total loss is combination loss (Eq.6)

And I guess it should be combined in one loss_total (`loss`) and call backward one time. Am I right?

as far as i understand in papers people put all the losses together.
however, in implementation the do it separately.
Similar for the original gan that they put all the losses as a min max loss but in training they do backward first for discriminator and update the weights in discriminator and then do it for generator and update the weights in generator.

also based on my experience it does not really matter if you do backward for each separately comparing to the case that you sum them and then do backward on all of them.

1 Like