Adding weights to the losses


I have a model with 5 different losses as shown below:

cls_loss_function = HeatmapLoss(reduction='mean')  # Custom Loss Function
txty_loss_function = nn.BCEWithLogitsLoss(reduction='none')
twth_loss_function = nn.SmoothL1Loss(reduction='none')
iou_loss_function = nn.SmoothL1Loss(reduction='none')
dep_loss_funciton = nn.SmoothL1Loss(reduction='none')

The losses are calculated and then the total_loss is returned as shown below.

total_loss = cls_loss + txty_loss + twth_loss + iou_loss + dep_loss total_loss.backward()

However, txty_loss dominates the total_loss. For example: After 50 Epochs with batch_size 3, Adam with lr=1e-4, GroupNorm with frozen BN layers of ResNet18 which is used as an Encoder, all the other losses except txty_loss are 0, but txty_loss ranges between 10 to 17. I think the model gives bad predictions because it is dominated by txty_loss. Could this be right?

Note: I have tried using SGD and Adam with different learning rates but the behaviour remains the same.

If so, how do I add weights to these losses, as in: Assigning coefficients to each loss but keeping the sum of those coefficients 1 ?. The concept should be similar to the weighted BCE, I think.

Feel free to correct if the concepts are wrong.

Thank You.

Wouldn’t this mean that the model training is respecting the other losses primarily (as they are reduced to 0), while txty_loss might increase?

Your approach sounds valid and you could assign coefficients to each loss term (and normalize them so that their sum equals 1).

total_loss = 0.2 * cls_loss + 0.2 * txty_loss + 0.2 * twth_loss + 0.2 * iou_loss + 0.2 * dep_loss 

(of course you might want to change the values)

@ptrblck , Thank you for responding.

What does this mean exactly? But txty_loss decreases but then just remains kinda constant. If the model indeed respects the other losses than txty_loss, why though?

Regarding the values,
so the txty_loss should have the highest weight?

I don’t know, but maybe the losses try to “pull” the model parameters into different directions?
E.g. what would happen if the 4 losses expect positive output values of the model to reduce their loss while txty_loss expects negative outputs? This is of course a very simplified view of this use case, but could something similar happen in your training?

@ptrblck ,

Hmm, interesting thought. I hadn’t thought about this. But I think it can also be because of my data and/or using a batch size of 2 or 3 with GroupNorm. Also, my data is extremely complicated. Could that also be the reason?
I am also training using only 2735 images. My belief is that I need more data.

It’s hard to tell what’s exactly causing the different loss values, but I would guess that the losses converging to zero might be easier to reduce for the model than the other (high) one.
Yes, the number of samples seems to quite limited, but it also might depend on the used model, the complexity of the use case etc.

@ptrblck Sorry for the late reply. I was trying out a few things to check if the loss is decreased or not. I tried to complicate the model (increasing no. of parameters) but the reduction in loss was not observed.

Regarding the above quote,
The loss with the highest value should have the lowest coefficient, right? and vice versa.

I think this wouldn’t change the “behavior” of the training and the already high loss would be further “ignored”, no?


Oh yeah!!! So the highest loss value should have the highest coefficient. This way model would understand that this particular loss is the issue and should be worked on further, right?