Multi-task learning: weight selection for combining loss functions


I have two tasks in my model- regression and classification (2 heads). I’m using both MSE and CE loss respectively.
As for now, I am combining the losses linearly: combined_loss = mse_loss+ce_loss,
and then doing: combined_loss.backward()

The main problem is that the scaling of the 2 losses is really different, and the MSE’a range is bigger than the CE’s range. The MSE can be between 60-140 (depends on the dataset) while the CE is between 0.2-0.6. Therefore the CE doesn’t really effect on the combined loss.

How can I scale the 2 losses in the most automated way? without doing grid search on different hyper parameters for weights?


Hi Almog!

There is a proposed scheme for training the relative weights of the per-task
losses when training a multi-task model. (I think I saw this discussed in a
previous thread on this forum, but I couldn’t find it.) I haven’t ever tried it,
but it looks sensible to me, and I imagine that it would work.

Here is a pytorch implementation and the reference it is based on:


K. Frank

1 Like

Hi Frank,
Thank you!

I have tried this solution, I am not sure if it improved the performance more than a linear combination of the two losses. Looks like the model performance remained similar.

I would appreciate more solutions to try please