# Confused about Loss addition of losses that work on different parts

Hello there, complete pytoch beginner here (started 5 months ago).

I have to build an Autoencoder that hast 3 different parts. 2 of those are the encoder that are fed with the same Image. The third part is the decoder.

Now I am really confused about summing up the losses when I have 3 losses where only 1 of them should work on one of the encoder sections.

I use the following rough way of doing this:

``````def lossFunction_3_(iterationIdx, out1_t,out2_t, out1_t-1, out2_t-1,reconst,image, optimizer):
subLoss = [0,0,0]
reconstLoss = nn.MSELoss(reconst, image)
subLoss = reconstLoss.item()

outLoss1 = L1Loss(out1_t , out1_t-1)
subLoss = outLoss1.item()

outLoss2 = L1Loss(out2_t , out2_t-1)
subLoss = outLoss2.item()

if (iterationIdx % 2 == 0):
(reconstLoss + outLoss1).backward()
else:
(reconstLoss + outLoss2).backward()

optimizer.step()

return sum(subLoss), subLoss

def trainLoop():
out1, out2, reconstruction, loss =  net.run()
loss = lossFunction(out1,out2,reconst,image, optimizer)
``````

So my question is:
does it make any difference if i use (reconstLoss +outLoss.item()).backward()? Does it still remember from where the out1 and out2 came and still updates only one of the 2 encoder parts? Or does it just get added to the reconst loss and is propagated over all 3 parts?
I want outLoss only to be used on one of the 2 encoder parts (Depending on iterationIdx) while the reconstruction loss updates all 3 parts.
(using pytoch 0.4.1)

``````# WRONG!
(reconstLoss +outLoss.item()).backward()

# Correct
(reconstLoss +outLoss).backward()
``````

`x.item()` returns it as a Python number, which means, your operations on the return of `x.item()` no longer come under the visibility of PyTorch. It’s pretty much like doing `(reconstLoss + 50).backward()`, i.e. adding a constant.

Now that we’ve gotten that out of the way, to answer your second question:

`(reconstLoss + outLoss1).backward()` will backprop `reconstLoss` gradients through everything that created it, and `outLoss1` through everything that created it.
They get independently backpropped, because `dL/dx(f(x) + f(y)) = dL/dx(f(x))` and `dL/dy(f(x) + f(y)) = dL/dy(f(y))` i.e. the gradients between `x` and `y` here dont have any dependent terms because of the addition operation.
On the other hand, if you did `x * y`, then the gradients going back through the path of `x` will have a scaling factor of `y`, and the gradients going back through the path of `y` will have a scaling factor of `x`.

1 Like