How to apply backward by summing multiple losses?

ysleer · March 9, 2021, 5:22am

Hi.

I am training a model using multiple data loaders.

First, the example code is as follows:

loss_list = list()
    for epoch in range(cfg.start_epoch, cfg.max_epoch):
        batch_time = AverageMeter()
        data_time = AverageMeter()
        losses = AverageMeter()
        losses_exist = AverageMeter()

        print("Train Epoch: {}".format(epoch))
        model.train()

        end = time.time()
        for i in range(len(train_loader_list)):
            train_loader = train_loader_list[i]
            for batch_idx, sample in enumerate(train_loader):
                data_time.update(time.time() - end)

                optimizer.zero_grad()

                input = sample['image'].cuda()
                target = sample['label'].cuda()
                target_exist = sample['exist'].cuda()

                output, output_exist = model(input)  # output_mid
                loss_seg = criterion(torch.nn.functional.log_softmax(output, dim=1), target)
                loss_exist = criterion_exist(output_exist, target_exist)
                loss = loss_seg + loss_exist * 0.1

                losses.update(loss.data.item(), input.size(0))
                losses_exist.update(loss_exist.item(), input.size(0))

                # optimizer.zero_grad()
                loss.backward()
                optimizer.step()

                # scheduler.step()

                # measure elapsed time
                batch_time.update(time.time() - end)
                end = time.time()

                if (batch_idx + 1) % cfg.print_freq == 0:
                    print((
                        'dataloader: {0} input_shape: {1} Epoch: [{2}][{3}/{4}], lr: {lr:.5f}\t' 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' 'Data {data_time.val:.3f} ({data_time.avg:.3f})\t' 'Loss {loss.val:.4f} ({loss.avg:.4f})\t' 'Loss_exist {loss_exist.val:.4f} ({loss_exist.avg:.4f})\t'.format(
                            i, input.shape[2:], epoch, batch_idx, len(train_loader), batch_time=batch_time,
                            data_time=data_time,
                            loss=losses,
                            loss_exist=losses_exist, lr=optimizer.param_groups[-1]['lr'])))
                    batch_time.reset()
                    data_time.reset()
                    losses.reset()

                if batch_idx == len(train_loader) - 1:
                    loss_list.append(loss)

        for loss_sample in loss_list:
            loss += loss_sample

        loss_tot = loss / len(loss_list)

        loss_tot.backward()
        optimizer.step()

As a result, several losses are inserted into the loss_list and finally the average of the losses is calculated.

Then, I want to backward the average loss to the model.

However, the following error occurs during the backward process.
RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.

Is there a way to backward the average loss again?

Thank you.

ptrblck · March 11, 2021, 8:22am

In side the train_loader loop you are already calling loss.backward(), which will calculate the gradients and will free the intermediate activations, which are needed for a second backward pass using this loss.
Later in the same loop you are appending loss to loss_list and try to call backward again on the sum of all losses, which will raise this issue. Besides the already freed intermediate activations, you are also already updating the parameters via optimizer.step() inside the train_loader loop, which would then raise another error, since the forward activations (kept via retain_graph=True) would be stale.

ysleer · March 15, 2021, 6:38am

Thank you for your polite response.

If so, how can I fix it?

ptrblck · March 15, 2021, 7:52am

It depends a bit on your use case and what you are trying to achieve?
I.e. since you are already updating the model in the inner loop, the outer backward call should be removed.
Alternatively, you could also remove the inner backward and step call and update the model in the outer loop.