I have got 2 questions concerning pytorchs backprop function when using multiple losses and different optimization strategies, and i am sure that clarification would help me (and maybe others) greatly in proving a profound understanding of it.
Question 1: Imagine we have an auto-encoder with encoder E and decoder D, and mini-batches of, lets say, size 4 containing the data (x_1,x_2,x_3,x_4). Apart from the reconstruction error (L2), I want to add a L1-Loss on the low dimensional codes E(x_i) and given codes, but only for a variable amount of i ranging between 0 and 4. Would something like this be correct (?) :
GivenCodes = net.extractGivenCodes(randomData) #this utilizes encoder E optimizer.zero_grad() output, lowDimCodes = net(mini_batch_images) lowDimLoss = Variable(torch.from_numpy(np.array([0.0])).float().cuda()) amountI = 0 for i in range(batchSize): if L1-Loss should be computed for E(x_i): lowDimLoss += l1_loss(lowDimCodes[i,:],GivenCodes[i,:]) amountI += 1 if amountI > 0: lowDimLoss = 1.0 / amountI * lowDimLoss L2Loss = l2_loss(output, mini_batch_images) compLoss = L2Loss + lowDimLoss compLoss.backward() optimizer.step()
2nd question: As above, we have an auto-encoder, but now a mini-batch contains data with different data types A and B. For A, I want to backprop only the reconstruction error (L2), whereas for B, I only want to backprop the L1-Loss on its low-dimensional codes E(x_i) (and not the reconstruction loss as well). So would I be better of using 2 optimizers, or does something like this work as well:
optimizer.zero_grad() Loss = Variable(torch.from_numpy(np.array([0.0])).float().cuda()) for each data in mini-batch: if data is of type A: output = net(data) Loss += L2_Loss(output,data) else: output_codes = net.forwardE(data) Loss += L1_Loss(output_codes, given_codes) (Do loss normalization) Loss.backward() optimizer.step()
Thanks very much in advance!