Combine Losses and Weight those

I am training a multitask model in which I have some classification and some regression tasks.

So I am using two loss functions:

loss_function_reg = nn.MSELoss()
loss_function_clf = nn.BCEWithLogitsLoss()

and combine them:

loss_reg = loss_function_reg(prediction_reg, batch[1].cuda())
loss_clf = loss_function_clf(prediction_clf.flatten(), batch[2].cuda())
loss = loss_reg + loss_clf

I have way more regression tasks than I have classification tasks. Should I weight for that? Because the loss obtained from either loss function is the average across all tasks. So the sum of the two losses is “biased” towards the loss function with less variables/tasks. Or Did I just overthink that?

You can merge them, that’s ok. And both in technical and mathematics, it’s work.
But adding them together is a simple way, you can add learning variable a to self-learning the “biased” of that two different loss.

a = torch.tensor(0.5, requires_grad=True)
loss = (1-a)*loss_reg + a*loss_clf

But if a is learnable, would the netowkr not start minimizing the loss function which is easier to minimize and then just fit a to only value that loss function?

Hi Janosch!

Conceptually, yes, a would train to prefer the easier loss. But, as
written, a could train to become a very large positive number or a
very large negative number, with your loss becoming unbounded

(You could train (1 - a)**2 * loss_reg + a**2 * loss_clf,
but now a will just train to 0 or 1 to ignore the “harder” loss.)

As to your original question, it does make sense to consider weighting
the two losses, e.g., something like:

loss = loss_reg + alpha * loss_clf

But alpha should now be a non-trainable hyperparameter whose
value you tune “by hand.” (You could also contemplate using some
automated hyperparameter-tuning scheme, but those are often more
complicated than they are worth.)


K. Frank

1 Like

That’s not true.
Cuz you could get loss back propagate to all elements.
And they share the loss equally
see the formula of loss

loss = (1-a)*loss_reg + a*loss_clf

the derivatives of a( or alpha)
is loss_clf - loss_reg
make them equal is not easy

Maybe I am missing your point. But my and kind of @KFrank point was that if you and learnable alpha it would bias your loss towards the easier predictions.

As in my example i have a few classification tasks( and the appropriate loss: loss_clf) and regression tasks(loss_reg)

If one of the two is easier to minimize ( yields a smaller loss) then alpha could just be set to only weight the smaller loss and ignore the “harder” task. The model would then just backproagate the loss of the one task but not the other effectively not training for the harder task

Thank you for explaining details of “hard” task.
I understood what you mean. So set a to a constant number, and harder task
got larger number.
That’s like weight loss, that is weight loss.

Yes for example. But I was more concerned about the need for weighting.

Lets says you have 45 regression tasks and only 1 classification task.
If you combine the two losses (loss_reg+loss_clf), do you “overweight” the clf loss? I am not sure. And that is why I was wondering if you should reweight the regression loss

You can’t think it as numbers’ adding.
It‘s optimization of mathmatics, finding the minimum params apply to the loss function.
So according to the Lagrange multiplier, you can find the most appropriate a to balance loss_ctf and loss_reg.