I am training two losses for multi-label autoencoder. Both losses aim at different segments in the attention module of the model. Initially, I was thinking of combining the two losses into a single loss like;
loss = loss1 - coef. * loss2 // having coef. to give less attention to loss2.
But in this case, the total loss starts going negative and keeps increasing in negative after some epochs. So I am thinking of updating both the losses separately for the model like;
Clarifying question: Should both losses be minimized in your problem?
If so then it seems to me that you would want something like loss = loss1 + coef. * loss2. Seems to me that your coefficient setup is perfectly reasonable, it will no doubt need to be tuned to produce the right focus for your model though.
Mathematically speaking I think hitting both individual loss functions with backward() is equivalent to this loss: loss = loss1 + loss2 as it will just accumulate the gradients together in the second backwards pass.
Thanks!
It was surprise to me that the model’s accuracy increased a lot even if the over all loss was too extreme to negative. I tested the model and based on it’s so good performance I am confused in justifying the behavior of the loss below zero as I haven’t seen any such examples so far.
@_joker in your original example you show subtracting on loss function from another. Most loss functions are usually defined to be minimized and not maximized. If you are subtracting a loss function then you are in-effect asking the optimizer to make that loss function as big as it can. This might result in your large negative loss values.
@zacharynew
True! The reason to subtract the losses is –
loss1 – encoded representation from the i th class label
loss2 – encoded representation from the non i th class label.
Here, I encourage the loss1 to be mapped to as close to hyper-ball center where each class is a p-norm ball so it has to be subtracted from loss1.