I have a CNN network that returns three different model outputs, so before doing doing the backward operation of the network, should i be taking the mean of the losses or just sum them up and do the backward pass. Something like this:

```
criterion1 = nn.CrossEntropyLoss(weights_label_0)
criterion2 = nn.CrossEntropyLoss(weights_label_1)
criterion3 = nn.CrossEntropyLoss(weights_label_2)
loss_1 = criterion1(output[0], label_0)
loss_2 = criterion2(output[1], label_1)
loss_3 = criterion3(output[2], label_2)
loss = loss_1+loss_2+loss_3
loss.backward()
# or
loss = (loss_1+loss_2+loss_3)/3
loss.backward()
```

which of the two would be correct ? Also, i have a slight confusion regarding calculating the weights for the labels. should the weights be calculated per batch or per dataset ?