# Combine Losses and Weight those

I am training a multitask model in which I have some classification and some regression tasks.

So I am using two loss functions:

loss_function_reg = nn.MSELoss()
loss_function_clf = nn.BCEWithLogitsLoss()

and combine them:

loss_reg = loss_function_reg(prediction_reg, batch[1].cuda())
loss_clf = loss_function_clf(prediction_clf.flatten(), batch[2].cuda())
loss = loss_reg + loss_clf
loss.backward(loss)

I have way more regression tasks than I have classification tasks. Should I weight for that? Because the loss obtained from either loss function is the average across all tasks. So the sum of the two losses is âbiasedâ towards the loss function with less variables/tasks. Or Did I just overthink that?

You can merge them, thatâs ok. And both in technical and mathematics, itâs work.
But adding them together is a simple way, you can add learning variable a to self-learning the âbiasedâ of that two different loss.

loss = (1-a)*loss_reg + a*loss_clf

But if a is learnable, would the netowkr not start minimizing the loss function which is easier to minimize and then just fit a to only value that loss function?

Hi Janosch!

Conceptually, yes, a would train to prefer the easier loss. But, as
written, a could train to become a very large positive number or a
very large negative number, with your loss becoming unbounded
below.

(You could train (1 - a)**2 * loss_reg + a**2 * loss_clf,
but now a will just train to 0 or 1 to ignore the âharderâ loss.)

As to your original question, it does make sense to consider weighting
the two losses, e.g., something like:

loss = loss_reg + alpha * loss_clf

But alpha should now be a non-trainable hyperparameter whose
value you tune âby hand.â (You could also contemplate using some
automated hyperparameter-tuning scheme, but those are often more
complicated than they are worth.)

Best.

K. Frank

1 Like

Thatâs not true.
Cuz you could get loss back propagate to all elements.
And they share the loss equally
see the formula of loss

loss = (1-a)*loss_reg + a*loss_clf

the derivatives of a( or alpha)
is loss_clf - loss_reg
make them equal is not easy

Maybe I am missing your point. But my and kind of @KFrank point was that if you and learnable alpha it would bias your loss towards the easier predictions.

As in my example i have a few classification tasks( and the appropriate loss: loss_clf) and regression tasks(loss_reg)

If one of the two is easier to minimize ( yields a smaller loss) then alpha could just be set to only weight the smaller loss and ignore the âharderâ task. The model would then just backproagate the loss of the one task but not the other effectively not training for the harder task

Thank you for explaining details of âhardâ task.
I understood what you mean. So set a to a constant number, and harder task
got larger number.
Thatâs like weight loss, that is weight loss.

Yes for example. But I was more concerned about the need for weighting.

Lets says you have 45 regression tasks and only 1 classification task.
If you combine the two losses (loss_reg+loss_clf), do you âoverweightâ the clf loss? I am not sure. And that is why I was wondering if you should reweight the regression loss

You canât think it as numbersâ adding.
Itâs optimization of mathmatics, finding the minimum params apply to the loss function.
So according to the Lagrange multiplier, you can find the most appropriate a to balance loss_ctf and loss_reg.