Correct order of .backward()'s for multiple losses, multiple model runs in training step

JeremySMorgan · August 15, 2020, 6:13am

Hi There,

I’m working on training an Invertible Neural Network (see https://arxiv.org/abs/1808.04730 / https://github.com/VLL-HD/FrEIA), which basically is a neural network that you can run both in the forward and in the reverse direction. For each training batch, I run the network in the forward direction, then calculate multiple losses and then run the network in reverse, and calculate multiple losses.

I’m posting to get help with when to call .backward() on my losses.

I want to only run one parameter update per batch, so not after the forward, and reverse step. Simplified down, my current program runs like:

def training_step(x, y):

    y_hat = f(x)
    forward_loss1 = floss1(y_hat, y)
    forward_loss2 = floss2(y_hat, y)
    l_forward = forward_loss1 + forward_loss2
    l_forward.backward()
    ...

    x_hat = f(y, reverse=True)
    reverse_loss1 = rloss1(x_hat, x)
    reverse_loss2 = rloss2(x_hat, x)
    l_rev = reverse_loss1 + reverse_loss2
    l_rev.backward()

    optimizer.step()

Questions:

Is this a correct way to train? would it instead be better to run l_total = reverse_loss1 + reverse_loss2 + forward_loss1 + forward_loss2 and then l_total.backward()?
Does running the network the second time clear the gradient saved from the l_forward.backward()?

Thanks so much!

mariosasko · August 15, 2020, 1:13pm

It seems fine. The second approach consumes more memory (it has to store two graphs from the forward and the reverse pass at the same time), but the end result should be the same.
No, the gradients get accumulated (summed with the previous value).