Confused about Loss addition of losses that work on different parts

Lemling · December 22, 2018, 7:04pm

Hello there, complete pytoch beginner here (started 5 months ago).

I got the following task:
I have to build an Autoencoder that hast 3 different parts. 2 of those are the encoder that are fed with the same Image. The third part is the decoder.

Now I am really confused about summing up the losses when I have 3 losses where only 1 of them should work on one of the encoder sections.

I use the following rough way of doing this:

def lossFunction_3_(iterationIdx, out1_t,out2_t, out1_t-1, out2_t-1,reconst,image, optimizer):
    subLoss = [0,0,0]
    reconstLoss = nn.MSELoss(reconst, image)
    subLoss[0] = reconstLoss.item()

    outLoss1 = L1Loss(out1_t , out1_t-1)
    subLoss[1] = outLoss1.item()

    outLoss2 = L1Loss(out2_t , out2_t-1)
    subLoss[2] = outLoss2.item()

    if (iterationIdx % 2 == 0):
        (reconstLoss + outLoss1).backward()
    else:
       (reconstLoss + outLoss2).backward()

    optimizer.step()
    optimizer.zero_grad()

    return sum(subLoss), subLoss

def trainLoop():
    out1, out2, reconstruction, loss =  net.run()
    loss = lossFunction(out1,out2,reconst,image, optimizer)

So my question is:
does it make any difference if i use (reconstLoss +outLoss.item()).backward()? Does it still remember from where the out1 and out2 came and still updates only one of the 2 encoder parts? Or does it just get added to the reconst loss and is propagated over all 3 parts?
I want outLoss only to be used on one of the 2 encoder parts (Depending on iterationIdx) while the reconstruction loss updates all 3 parts.
(using pytoch 0.4.1)

smth · December 23, 2018, 6:58am

First issue in your question:

# WRONG!
(reconstLoss +outLoss.item()).backward()

# Correct
(reconstLoss +outLoss).backward()

x.item() returns it as a Python number, which means, your operations on the return of x.item() no longer come under the visibility of PyTorch. It’s pretty much like doing (reconstLoss + 50).backward(), i.e. adding a constant.

Now that we’ve gotten that out of the way, to answer your second question:

(reconstLoss + outLoss1).backward() will backprop reconstLoss gradients through everything that created it, and outLoss1 through everything that created it.
They get independently backpropped, because dL/dx(f(x) + f(y)) = dL/dx(f(x)) and dL/dy(f(x) + f(y)) = dL/dy(f(y)) i.e. the gradients between x and y here dont have any dependent terms because of the addition operation.
On the other hand, if you did x * y, then the gradients going back through the path of x will have a scaling factor of y, and the gradients going back through the path of y will have a scaling factor of x.