Training two models simultaneusly perform worse

Hi, I trained a model one at a time which gives good results. But when I tried to train it at the same time the performance is worse. Am I doing it wrong?

for data, target in tqdm(train_loader):
    data = data.to(device)
    target = target.to(device)
    
    
    if data.grad is not None:
        with torch.no_grad():
            data.grad.zero_()
            
    optimizer1.zero_grad()        
    output1 = model1(data.float())
    loss1 = F.cross_entropy(output1, target.long()[:,0])#criterion(output1, target.long()[:,0])
    output1 = F.softmax(output1,dim=1)
    output1 = output1.max(1)[1].cpu().numpy()
    acc1 = f1_score(output1.astype(int), target.cpu().numpy()[:,0], average='macro',  zero_division=0)
    loss1.backward()
    optimizer1.step()

    optimizer2.zero_grad()        
    output2 = model2(data.float())
    loss2 = F.cross_entropy(output2, target.long()[:,1])
    output2 = F.softmax(output2,dim=1)
    output2 = output2.max(1)[1].cpu().numpy()
    acc2 = f1_score(output2.astype(int), target.cpu().numpy()[:,1], average='macro',  zero_division=0)
    loss2.backward()
    optimizer2.step()

I assume you have a multitask problem for your model. In my understanding, passing the backward function twice would build a rather unique computation graph on PyTorch.
Is there any reason behind the separate back propagation of each loss?

Even if it’s a different loss and optimizer? Can I just backward it to a specific model?
If I only want each loss to backward to it’s respective models, something like this should be ok right?

for data, target, idx in tqdm(train_loader):
    data = data.to(device)
    target = target.to(device)
                  
    model1.requires_grad_(True)
    model2.requires_grad_(False)
         
    output1 = model1(data.float())
    loss1 = F.cross_entropy(output1, target.long()[:,0])
    output1 = F.softmax(output1,dim=1)
    output1 = output1.max(1)[1].cpu().numpy()
    acc1 = f1_score(output1.astype(int), target.cpu().numpy()[:,0], average='macro',  zero_division=0)
    loss1.backward()
    optimizer1.step()
    optimizer1.zero_grad() 
                  
    model1.requires_grad_(False)
    model2.requires_grad_(True)
            
    output2 = model2(data.float())
    loss2 = F.cross_entropy(output2, target.long()[:,1])
    output2 = F.softmax(output2,dim=1)
    output2 = output2.max(1)[1].cpu().numpy()
    acc2 = f1_score(output2.astype(int), target.cpu().numpy()[:,1], average='macro',  zero_division=0)
    loss2.backward()
    optimizer2.step()
    optimizer2.zero_grad() 

Originally it should be a multiclass-multilabel problem but training it separately as a multiclass problem turns out to be better.
Now I just want to train it simultaneously to reduce the training time.
Thanks for the response, Om Daniel.

Yes you can certainly backward pass both optimizers respectively. Another way is to sum the losses and pass it to backward function just once, that will draw (probably) the same computation graphs.

As to the performance of multitask training, it’s worth a shot to apply loss weighting. That means I’d multiply each loss with certain weight.

1 Like