How to correctly skip some training steps which dissatisfies expected conditions?

Hello, everyone! I am trying to write code to achieve the following functions:
Due to some reasons, during training, my loss can be NONE. So when the loss is NONE, I skip this training step and directly continue the next iteration step. But it seems that I fail to free up the GPU memory. Thus in the next iteration step, I will encounter “out of memory” error. Some part of my code is as follows, and I use two networks in my code. I have tried to delete the output,input,target, but my GPUs are still out of memory.

I am very grateful if anyone is nice to help me!

    for i, (input, target) in enumerate(train_loader):
        actual_step = int(args.start_step + cnt)
        adjust_learning_rate(optimizer1, actual_step)
        adjust_learning_rate(optimizer2, actual_step)

        input = input.cuda()
        target = target.cuda()

        # compute output
        output1 = model1(input)
        output2 = model2(input)

        loss1_1, loss2_1 = criterion(output1[0], output2[0], target)
        loss1_2, loss2_2 = criterion(output1[1], output2[1], target)
        if loss1_1 is None:
            print("loss1_1 is None!")
            # TODO release GPU memory
            continue # in the next step, I encounter Out of Memory Error.
        if loss1_2 is None:
            print("loss1_2 is None!")
            # TODO release GPU memory
            continue

        loss1 = loss1_1 + 0.4 * loss1_2
        loss2 = loss2_1 + 0.4 * loss2_2

        # compute gradient and do SGD step
        optimizer1.zero_grad()
        loss1.backward()
        optimizer1.step()

        optimizer2.zero_grad()
        loss2.backward()
        optimizer2.step()

Hello,

You could just call .backward() without optimizer.step() if the step is unexpected condition.

optimizer.zero_grad()
if condition:
    loss.backward()
else:
    loss.backward()
    optimizer.step()
1 Like

Hello,Thanks very much for your reply! But I think this way may accumulate the backward gradient. I have tried another similar way. When loss is none, I change to use a normal way to calculate the loss and everything is fine now.

1 Like

Hi @HomerNee , I met the same issue. Would you mind tell me how to solve this problem more specific?