Updating weights with two optimizers

AfonsoSalgadoSousa · June 30, 2021, 3:01pm

Hi. I have a system with a discriminator, encoders, decoders and classifiers where: I want to approximate two distributions in an adversarial process using the encoders and the discriminator; I want to keep the original features of the distribution using encoders and decoders; I want to keep the system discriminative to the classification task using a classifier.
My problem is that I want to update the encoders’ weights using two different optimizers:

optimizer_G = torch.optim.Adam(
        itertools.chain(encoder_a.parameters(), encoder_v.parameters(), encoder_l.parameters(),
                        decoder_a.parameters(), decoder_l.parameters(), decoder_v.parameters()), weight_decay=decay, lr=lr, betas=(b1, b2))
optimizer_E = torch.optim.Adam(
        itertools.chain(encoder_a.parameters(), encoder_v.parameters(), encoder_l.parameters(),
                        classifier.parameters()), lr=lr, betas=(b1, b2), weight_decay=decay)

The main training process I am using is the following:

x = batch[:-1]
x_a = Variable(x[0], requires_grad=False).squeeze()
x_v = Variable(x[1], requires_grad=False).squeeze()
x_t = Variable(x[2], requires_grad=False)
y = Variable(batch[-1].view(-1, output_dim), requires_grad=False)

# encoder-decoder
optimizer_G.zero_grad()
a_en = encoder_a(x_a)
v_en = encoder_v(x_v)
l_en = encoder_l(x_t)
a_de = decoder_a(a_en)
v_de = decoder_v(v_en)
l_de = decoder_l(l_en)

rl1 = pixelwise_loss(a_de, x_a) + pixelwise_loss(v_de, x_v) + \
        pixelwise_loss(l_de, x_t)  # reconstruction loss

g_loss = alpha * (adversarial_loss(discriminator(l_en), valid) +
        adversarial_loss(discriminator(v_en), valid)) + (1 - alpha) * (rl1)
g_loss.backward(retain_graph=True)
optimizer_G.step()

# classifier
optimizer_E.zero_grad()
a = classifier(a_en)
v = classifier(v_en)
l = classifier(l_en)
c_loss = criterion(a, y) + criterion(l, y) + \
                criterion(v, y)  # classification loss
c_loss.backward(retain_graph=True)
optimizer_E.step()

The error I get is:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [50, 50]], which is output 0 of TBackward, is at version 2; expected version 1 instead.

I believe the problem is that optimizer_G.step() changes the weights inplace that are otherwise needed to compute the c_loss.backward(), as noted by @albanD in this issue, but I don’t know how to circumvent this problem. Do you have any idea?

AfonsoSalgadoSousa · June 30, 2021, 3:41pm

I managed to circumvent the problem by calling both .backward and .step methods at the end of the loop.