How to learn the weights between two losses?

udemirezen · June 18, 2021, 1:32pm

For this loss, we estimate the logarithm of the variance.
When the outputs of the network are very small (for example 1e-4), self.eta is negative because of the logarithm function. If the self.eta is greater then the torch.Tensor(loss) * torch.exp(-self.eta) then the total loss is negative.

What do you think when you estimate the very small float numbers as output of the NN as a regression operation.

Tony-Y · June 19, 2021, 3:20am

Please see my experiment using a linear model bellow.

MultiTaskLoss.ipynb · GitHub

In this experiment, I used torch.stack instead of torch.Tensor to fix the reported bug of my original code as the following:

total_loss = torch.stack(loss) * torch.exp(-self.eta) + self.eta

total_loss and eta’s should be negative when original losses are converged to zero.

Reference for the linear model: https://arxiv.org/pdf/1905.11286v2.pdf
(Section 4: Experiments With Deep Linear Networks)

nikiguo93 · March 30, 2022, 8:35pm

Hi I used your code but I want to put my data on cuda. both the input,target, and parameters of multitaskloss have been put to cuda. But unfortunately I got error like:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
could you please offer me some clues to check this problem?

Tony-Y · March 31, 2022, 5:35am

Did you use cuda() or to(device) as the follwing?

mtl = MultiTaskLoss(model=MultiTaskModel(),
                    loss_fn=[loss_fn1, loss_fn2],
                    eta=[1.0, 1.0]).cuda()

nikiguo93 · April 6, 2022, 1:59pm

Hi, I use this multi-task loss and have some questions. Here are the questions: Total loss of multi-task model
Could you please help me in improving the accuracy of classification ?
is_regression = torch.Tensor([False,True]). and loss_1 is mse loss while loss_2 is cross-entropy loss.

Murphy · January 15, 2023, 2:05am

greetings i tested alot of things:

i trained the models speretaly and it seems they work without the multitask.

I also took your advice and did not make a custom initialization. It works good but the problem with multitask is still there:

I think its because of the loss function.

Currently i just add the loss together, put the sum in backprop. But that doesnt seem to work. Is there a good tutorial or way to make the loss for multitask pretrained models?

I integrated the Multilossfunction from this thread:

But it still does not work. I can train them seperately but together they dont work (one task accurcay rises, while the other stays low)

I use crossentropy for both.

model = Resnet50_multiTaskNet().to(device)        
criterion = [nn.CrossEntropyLoss(), nn.CrossEntropyLoss()]

def loss_fn1(x, cls):
    return 2 * criterion[0](x, cls)
def loss_fn2(x, cls):
    return 2 * criterion[1](x, cls)

mtl = MultiTaskLoss(model=model,
                    loss_fn=[loss_fn1, loss_fn2],
                    eta=[1.0, 1.0]).to(device)  



optimizer = optim.Adam(mtl.parameters())

class Resnet50_multiTaskNet(nn.Module):
    def __init__(self):
        super(Resnet50_multiTaskNet, self).__init__()
        
        self.model =  models.resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)

        for param in self.model.parameters():
            param.requires_grad = False 

        self.fc_artist = nn.Linear(2048, class_length ['artist']).to(device)
        self.fc_style = nn.Linear(2048, class_length ['style']).to(device)

    def forward(self, x):
        x = self.model.conv1(x)
        x = self.model.bn1(x)
        x = self.model.relu(x)
        x = self.model.maxpool(x)

        x = self.model.layer1(x)
        x = self.model.layer2(x)
        x = self.model.layer3(x)
        x = self.model.layer4(x)
        x = self.model.avgpool(x)
        x = x.view(x.size(0), -1)

        x_artist = self.fc_artist(x)
        x_style = self.fc_style(x)
        return x_artist, x_style
    
#multitaskloss
class MultiTaskLoss(nn.Module):
    def __init__(self, model, loss_fn, eta) -> None:
        super(MultiTaskLoss, self).__init__()
        self.model = model
        self.loss_fn = loss_fn
        self.eta = nn.Parameter(torch.Tensor(eta))

    def forward(self, input, targets) -> Tuple[torch.Tensor, torch.Tensor]:
        outputs = self.model(input)
        loss = [l(o,y) for l, o, y in zip(self.loss_fn, outputs, targets)]
        total_loss = torch.stack(loss) * torch.exp(-self.eta) + self.eta
        return loss, total_loss.sum(), outputs  # omit 1/2

Anyone has an idea why?

Tony-Y · January 15, 2023, 3:27am

" A Simple General Approach to Balance Task Difficulty in Multi-Task Learning"

This paper summaries multi-task learning methods. You should first try the direct sum approach after examining the models separately.

Murphy · January 15, 2023, 3:30am

thank you for the paper!

is there a built in function for the minimize of the loss results? i use torch.

Tony-Y · January 15, 2023, 3:57am

If you want get the minimum value, you can use torch.min.

Murphy · January 15, 2023, 4:11am

i tried your approach but i get loss in like that:
tensor(7.3046, device=‘cuda:0’, grad_fn=)
tensor(4.8561, device=‘cuda:0’, grad_fn=)

and then only the bigger loss seems to affect the backprop so only Task1 gets better accuracy while Task2 stays on a very very low acc

direct sum approach: i just summed the losses, could you give me an example how do modify that?

edit:

i had mistake in my train loop

now it works

im gonna sleep and test it over the night

man im so stupid , i used the pred from task 1 also for task 2 for the accuracy

thank you for your help