I have way more regression tasks than I have classification tasks. Should I weight for that? Because the loss obtained from either loss function is the average across all tasks. So the sum of the two losses is âbiasedâ towards the loss function with less variables/tasks. Or Did I just overthink that?
You can merge them, thatâs ok. And both in technical and mathematics, itâs work.
But adding them together is a simple way, you can add learning variable a to self-learning the âbiasedâ of that two different loss.
a = torch.tensor(0.5, requires_grad=True)
loss = (1-a)*loss_reg + a*loss_clf
But if a is learnable, would the netowkr not start minimizing the loss function which is easier to minimize and then just fit a to only value that loss function?
Conceptually, yes, a would train to prefer the easier loss. But, as
written, a could train to become a very large positive number or a
very large negative number, with your loss becoming unbounded
below.
(You could train (1 - a)**2 * loss_reg + a**2 * loss_clf,
but now a will just train to 0 or 1 to ignore the âharderâ loss.)
As to your original question, it does make sense to consider weighting
the two losses, e.g., something like:
loss = loss_reg + alpha * loss_clf
But alpha should now be a non-trainable hyperparameter whose
value you tune âby hand.â (You could also contemplate using some
automated hyperparameter-tuning scheme, but those are often more
complicated than they are worth.)
Maybe I am missing your point. But my and kind of @KFrank point was that if you and learnable alpha it would bias your loss towards the easier predictions.
As in my example i have a few classification tasks( and the appropriate loss: loss_clf) and regression tasks(loss_reg)
If one of the two is easier to minimize ( yields a smaller loss) then alpha could just be set to only weight the smaller loss and ignore the âharderâ task. The model would then just backproagate the loss of the one task but not the other effectively not training for the harder task
Thank you for explaining details of âhardâ task.
I understood what you mean. So set a to a constant number, and harder task
got larger number.
Thatâs like weight loss, that is weight loss.
Yes for example. But I was more concerned about the need for weighting.
Lets says you have 45 regression tasks and only 1 classification task.
If you combine the two losses (loss_reg+loss_clf), do you âoverweightâ the clf loss? I am not sure. And that is why I was wondering if you should reweight the regression loss
You canât think it as numbersâ adding.
Itâs optimization of mathmatics, finding the minimum params apply to the loss function.
So according to the Lagrange multiplier, you can find the most appropriate a to balance loss_ctf and loss_reg.