Should i take the mean of loss before doing backward operation for multiple tasks?

I have a CNN network that returns three different model outputs, so before doing doing the backward operation of the network, should i be taking the mean of the losses or just sum them up and do the backward pass. Something like this:

criterion1 = nn.CrossEntropyLoss(weights_label_0)
criterion2 = nn.CrossEntropyLoss(weights_label_1)
criterion3 = nn.CrossEntropyLoss(weights_label_2)

loss_1 = criterion1(output[0], label_0)
loss_2 = criterion2(output[1], label_1)
loss_3 = criterion3(output[2], label_2)

loss = loss_1+loss_2+loss_3
loss.backward()

# or 

loss = (loss_1+loss_2+loss_3)/3
loss.backward()

which of the two would be correct ? Also, i have a slight confusion regarding calculating the weights for the labels. should the weights be calculated per batch or per dataset ?

I would opted for the mean version.

2 Likes

also as for the your second question, the weights should be determine per dataset.

2 Likes

If you are always training with these 3 tasks, it doesn’t matter what you do. It’s equivalent to training with a learning rate that is 3x bigger/smaller.

If you plan on comparing to single task learning or switching up your tasks, you should use the mean version, which should keep the gradients more similar across conditions.

1 Like