Backprop using multiple losses and different optimization strategies

Nbout · January 30, 2018, 10:41am

Hey guys,

I have got 2 questions concerning pytorchs backprop function when using multiple losses and different optimization strategies, and i am sure that clarification would help me (and maybe others) greatly in proving a profound understanding of it.

Question 1: Imagine we have an auto-encoder with encoder E and decoder D, and mini-batches of, lets say, size 4 containing the data (x_1,x_2,x_3,x_4). Apart from the reconstruction error (L2), I want to add a L1-Loss on the low dimensional codes E(x_i) and given codes, but only for a variable amount of i ranging between 0 and 4. Would something like this be correct (?) :

            GivenCodes = net.extractGivenCodes(randomData) #this utilizes encoder E
            optimizer.zero_grad()
            output, lowDimCodes = net(mini_batch_images)
            lowDimLoss = Variable(torch.from_numpy(np.array([0.0])).float().cuda())
            amountI = 0
            for i in range(batchSize):
                if L1-Loss should be computed for E(x_i):
                    lowDimLoss += l1_loss(lowDimCodes[i,:],GivenCodes[i,:])
                    amountI += 1
            if amountI > 0:
                lowDimLoss = 1.0 / amountI * lowDimLoss
            L2Loss = l2_loss(output, mini_batch_images)
            compLoss = L2Loss + lowDimLoss
            compLoss.backward()
            optimizer.step()

2nd question: As above, we have an auto-encoder, but now a mini-batch contains data with different data types A and B. For A, I want to backprop only the reconstruction error (L2), whereas for B, I only want to backprop the L1-Loss on its low-dimensional codes E(x_i) (and not the reconstruction loss as well). So would I be better of using 2 optimizers, or does something like this work as well:

optimizer.zero_grad()
Loss = Variable(torch.from_numpy(np.array([0.0])).float().cuda())
for each data in mini-batch:
     if data is of type A:
          output = net(data)
          Loss += L2_Loss(output,data)
     else:
          output_codes = net.forwardE(data)
          Loss += L1_Loss(output_codes, given_codes)
(Do loss normalization)
Loss.backward()
optimizer.step()

Thanks very much in advance!