How to backward the average of multiple losses?

I am trying to train a model using multiple data loaders.

The code I use is as follows:

loss_list = list()
    for epoch in range(cfg.start_epoch, cfg.max_epoch):
        batch_time = AverageMeter()
        data_time = AverageMeter()
        losses = AverageMeter()
        losses_exist = AverageMeter()

        print("Train Epoch: {}".format(epoch))

        model.train()

        end = time.time()
        for i in range(len(train_loader_list)):
            train_loader = train_loader_list[i]
            for batch_idx, sample in enumerate(train_loader):
                data_time.update(time.time() - end)

                optimizer.zero_grad()
                model.zero_grad()

                input = torch.autograd.Variable(sample['image'].cuda())

                target = torch.autograd.Variable(sample['label'].cuda())
                target_exist = torch.autograd.Variable(sample['exist'].cuda())

                output, output_exist = model(input)  # output_mid
                loss_seg = criterion(torch.nn.functional.log_softmax(output, dim=1), target)
                loss_exist = criterion_exist(output_exist, target_exist)
                loss = loss_seg + loss_exist * 0.1

                losses.update(loss.data.item(), input.size(0))
                losses_exist.update(loss_exist.item(), input.size(0))

                # optimizer.zero_grad()
                loss.backward(retain_graph=True)
                optimizer.step()

                # scheduler.step()

                # measure elapsed time
                batch_time.update(time.time() - end)
                end = time.time()
               

                if batch_idx == len(train_loader) - 1:
                    loss_list.append(loss)

        for loss_sample in loss_list:
            loss += loss_sample.item()

        loss_tot = loss / len(loss_list)

        optimizer.zero_grad()
        loss_tot.backward()
        optimizer.step()

I don’t know why this doesn’t work.

I get an error in the backward part.

I searched many places but couldn’t find the answer.

Helpme.

What exactly is the error?

Thank you for answer.

The error is as follows:
“RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [128, 4]], which is output 0 of TBackward, is at version 106; expected version 105 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True)”.

So I assume the error is with adding the loss. Can you try removing the * 0.1 and seeing if it helps.

I get the same error.

We are replicating one model as much as the data loader to be used and training it with multiple data loaders.

When I thought about it, I didn’t know which model to update, so I thought I got an error. Does this also make sense?

I guess you just need to concat the various dataloaders. Drop these:

loss_tot = loss / len(loss_list)

optimizer.zero_grad()
loss_tot.backward()
optimizer.step()

Thank you for your reply.

But I didn’t understand the answer.

How do I update the average loss of each data loader loss if I proceed according to the answers you provided??

Also, what is the connection between connecting the data loader and deleting the part of the answer??

It depends on what do you want to do with the loss.

Do you just need the value so you know what the average for the dataloaders are? Or is it something else? More particularly, I don’t understand you you to backward the loss_tot.

If you look at how it works, I think loss_tot will be loss of the last batch of the last dataloader + losses of the last batches of the dataloaders. It doesn’t make sense.

Thank you for answer.

To explain what I want to do, the approximate process is as follows.

In the end, you want to update using the average of each data loader’s losses.

Isn’t this structure a proper structure??

Usually NN models are trained using mini-batches. That’s why I suggest to just drop the mean loss backward part. With your code, you are trying to train with BOTH mini batches (the inner loss backward) AND also dataset/dataloader wise (once every len(dataloader)) (outer total_loss backward). This is not consistent in that you have so much more mini batches than datasets/dataloaders and you end up having those dataset/dataloader updates having little significance.

If you insist to do averaging, perhaps you can do zipping of the datasets, then in each batch, calculate the loss for each of the batch from difference datasets and then averaging the loss and backward.

You can zip multiple datasets like this: Two DataLoaders from two different datasets within the same loop - #5 by Joshua_Clancy

Thank you for your kind explanation.

I will also refer to the explanation related to zip.

I think integrating the dataloader is an efficient way. I tried, but the data loaders I made have different image sizes and batch sizes. So the integration failed.

In addition, is there a way to learn the input size and batch size differently, or is there an example that I can refer to?

For example,
multi_scale = [(1025, 512), (512, 256), (256, 128)]
multi_batch = [3, 4, 12]

After setting as follows, I would like to configure the batch size according to the input size.

I couldn’t find a way to integrate it so I configured the dataloader differently.

Is there a good way??

So you wanted to do is multi-scale training?

Then perhaps you could do:

for batch1, batch2, batch3 in zip(daloaders[0], dataloaders[1], dataloaders[2]):
    input1, label1 = batch1
    input2, label2 = batch2
    input3, label3 = batch3

    pred1 = model(input1)
    pred2 = model(input2)
    pred3 = model(input3)
    loss1 = criterion(pred1, label1)
    loss2 = criterion(pred2, label2)
    loss3 = criterion(pred3, label3)
    loss = loss1 + loss2 + loss3
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()