Using multiple losses for a model

I am training two losses for multi-label autoencoder. Both losses aim at different segments in the attention module of the model. Initially, I was thinking of combining the two losses into a single loss like;

loss = loss1 - coef. * loss2 // having coef. to give less attention to loss2.

But in this case, the total loss starts going negative and keeps increasing in negative after some epochs. So I am thinking of updating both the losses separately for the model like;


I am looking for more suggestions and if I am missing something in my first approach (combined loss).

Clarifying question: Should both losses be minimized in your problem?

If so then it seems to me that you would want something like loss = loss1 + coef. * loss2. Seems to me that your coefficient setup is perfectly reasonable, it will no doubt need to be tuned to produce the right focus for your model though.

Mathematically speaking I think hitting both individual loss functions with backward() is equivalent to this loss: loss = loss1 + loss2 as it will just accumulate the gradients together in the second backwards pass.

It was surprise to me that the model’s accuracy increased a lot even if the over all loss was too extreme to negative. I tested the model and based on it’s so good performance I am confused in justifying the behavior of the loss below zero as I haven’t seen any such examples so far.

@_joker in your original example you show subtracting on loss function from another. Most loss functions are usually defined to be minimized and not maximized. If you are subtracting a loss function then you are in-effect asking the optimizer to make that loss function as big as it can. This might result in your large negative loss values.

True! The reason to subtract the losses is –
loss1 – encoded representation from the i th class label
loss2 – encoded representation from the non i th class label.

Here, I encourage the loss1 to be mapped to as close to hyper-ball center where each class is a p-norm ball so it has to be subtracted from loss1.