Hey guys,

I have got 2 questions concerning pytorchs backprop function when using multiple losses and different optimization strategies, and i am sure that clarification would help me (and maybe others) greatly in proving a profound understanding of it.

Question 1: Imagine we have an auto-encoder with encoder E and decoder D, and mini-batches of, lets say, size 4 containing the data (x_1,x_2,x_3,x_4). Apart from the reconstruction error (L2), I want to add a L1-Loss on the low dimensional codes E(x_i) and given codes, but only for a variable amount of i ranging between 0 and 4. Would something like this be correct (?) :

```
GivenCodes = net.extractGivenCodes(randomData) #this utilizes encoder E
optimizer.zero_grad()
output, lowDimCodes = net(mini_batch_images)
lowDimLoss = Variable(torch.from_numpy(np.array([0.0])).float().cuda())
amountI = 0
for i in range(batchSize):
if L1-Loss should be computed for E(x_i):
lowDimLoss += l1_loss(lowDimCodes[i,:],GivenCodes[i,:])
amountI += 1
if amountI > 0:
lowDimLoss = 1.0 / amountI * lowDimLoss
L2Loss = l2_loss(output, mini_batch_images)
compLoss = L2Loss + lowDimLoss
compLoss.backward()
optimizer.step()
```

2nd question: As above, we have an auto-encoder, but now a mini-batch contains data with different data types A and B. For A, I want to backprop only the reconstruction error (L2), whereas for B, I **only** want to backprop the L1-Loss on its low-dimensional codes E(x_i) (and not the reconstruction loss as well). So would I be better of using 2 optimizers, or does something like this work as well:

```
optimizer.zero_grad()
Loss = Variable(torch.from_numpy(np.array([0.0])).float().cuda())
for each data in mini-batch:
if data is of type A:
output = net(data)
Loss += L2_Loss(output,data)
else:
output_codes = net.forwardE(data)
Loss += L1_Loss(output_codes, given_codes)
(Do loss normalization)
Loss.backward()
optimizer.step()
```

Thanks very much in advance!