Multi-task learning: weight selection for combining loss functions

almog · May 22, 2022, 1:23pm

Hi,

I have two tasks in my model- regression and classification (2 heads). I’m using both MSE and CE loss respectively.
As for now, I am combining the losses linearly: combined_loss = mse_loss+ce_loss,
and then doing: combined_loss.backward()

The main problem is that the scaling of the 2 losses is really different, and the MSE’a range is bigger than the CE’s range. The MSE can be between 60-140 (depends on the dataset) while the CE is between 0.2-0.6. Therefore the CE doesn’t really effect on the combined loss.

How can I scale the 2 losses in the most automated way? without doing grid search on different hyper parameters for weights?

Thanks,
Almog

KFrank · May 23, 2022, 12:38am

Hi Almog!

There is a proposed scheme for training the relative weights of the per-task
losses when training a multi-task model. (I think I saw this discussed in a
previous thread on this forum, but I couldn’t find it.) I haven’t ever tried it,
but it looks sensible to me, and I imagine that it would work.

Here is a pytorch implementation and the reference it is based on:

github.com

ywatanabe1989/custom_losses_pytorch/blob/master/multi_task_loss.py

import torch

class MultiTaskLoss(torch.nn.Module):
  '''https://arxiv.org/abs/1705.07115'''
  def __init__(self, is_regression, reduction='none'):
    super(MultiTaskLoss, self).__init__()
    self.is_regression = is_regression
    self.n_tasks = len(is_regression)
    self.log_vars = torch.nn.Parameter(torch.zeros(self.n_tasks))
    self.reduction = reduction

  def forward(self, losses):
    dtype = losses.dtype
    device = losses.device
    stds = (torch.exp(self.log_vars)**(1/2)).to(device).to(dtype)
    self.is_regression = self.is_regression.to(device).to(dtype)
    coeffs = 1 / ( (self.is_regression+1)*(stds**2) )
    multi_task_losses = coeffs*losses + torch.log(stds)

    if self.reduction == 'sum':

This file has been truncated. show original

Best.

K. Frank

almog · May 25, 2022, 7:25am

Hi Frank,
Thank you!

I have tried this solution, I am not sure if it improved the performance more than a linear combination of the two losses. Looks like the model performance remained similar.

I would appreciate more solutions to try please
Thanks